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Preface 


Autoregressive conditionally heteroscedastic (ARCH) models were introduced by Engle in an 
article published in Econometrica in the early 1980s (Engle, 1982). The proposed application in 
that article focused on macroeconomic data and one could not imagine, at that time, that the main 
field of application for these models would be finance. Since the mid-1980s and the introduction of 
generalized ARCH (or GARCH) models, these models have become extremely popular among both 
academics and practitioners. GARCH models led to a fundamental change to the approaches used 
in finance, through an efficient modeling of volatility (or variability) of the prices of financial assets. 
In 2003, the Nobel Prize for Economics was jointly awarded to Robert F. Engle and Clive W.J. 
Granger ‘for methods of analyzing economic time series with time-varying volatility (ARCH)’. 

Since the late 1980s, numerous extensions of the initial ARCH models have been published (see 
Bollerslev, 2008, for a (tentatively) exhaustive list). The aim of the present volume is not to review 
all these models, but rather to provide a panorama, as wide as possible, of current research into 
the concepts and methods of this field. Along with their development in econometrics and finance 
journals, GARCH models and their extensions have given rise to new directions for research in 
probability and statistics. Numerous classes of nonlinear time series models have been suggested, 
but none of them has generated interest comparable to that in GARCH models. The interest of the 
academic world in these models is explained by the fact that they are simple enough to be usable 
in practice, but also rich in theoretical problems, many of them unsolved. 

This book is intended primarily for master’s students and junior researchers, in the hope of 
attracting them to research in applied mathematics, statistics or econometrics. For experienced 
researchers, this book offers a set of results and references allowing them to move towards one 
of the many topics discussed. Finally, this book is aimed at practitioners and users who may be 
looking for new methods, or may want to learn the mathematical foundations of known methods. 

Some parts of the text have been written for readers who are familiar with probability theory 
and with time series techniques. To make this book as self-contained as possible, we provide 
demonstrations of most theoretical results. On first reading, however, many demonstrations can 
be omitted. Those sections or chapters that are the most mathematically sophisticated and can 
be skipped without loss of continuity are marked with an asterisk. We have illustrated the main 
techniques with numerical examples, using real or simulated data. Program codes allowing the 
experiments to be reproduced are provided in the text and on the authors’ web pages. In general, 
we have tried to maintain a balance between theory and applications. 

Readers wishing to delve more deeply into the concepts introduced in this book will find a 
large collection of exercises along with their solutions. Some of these complement the proofs given 
in the text. 

The book is organized as follows. Chapter | introduces the basics of stationary processes and 
ARMA modeling. The rest of the book is divided into three parts. Part I deals with the standard 
univariate GARCH model. The main probabilistic properties (existence of stationary solutions, 
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representations, properties of autocorrelations) are presented in Chapter 2. Chapter 3 deals with 
complementary properties related to mixing, allowing us to characterize the decay of the time 
dependence. Chapter 4 is devoted to temporal aggregation: it studies the impact of the observation 
frequency on the properties of GARCH processes. 

Part II is concerned with statistical inference. We begin in Chapter 5 by studying the problem 
of identifying an appropriate model a priori. Then we present different estimation methods, starting 
with the method of least squares in Chapter 6 which, limited to ARCH, offers the advantage of 
simplicity. The central part of the statistical study is Chapter 7, devoted to the quasi-maximum 
likelihood method. For these models, testing the nullity of coefficients is not standard and is the 
subject of Chapter 8. Optimality issues are discussed in Chapter 9, as well as alternative estimators 
allowing some of the drawbacks of standard methods to be overcome. 

Part HI is devoted to extensions and applications of the standard model. In Chapter 10, models 
allowing us to incorporate asymmetric effects in the volatility are discussed. There is no natural 
extension of GARCH models for vector series, and many multivariate formulations are presented 
in Chapter 11. Without carrying out an exhaustive statistical study, we consider the estimation of 
a particular class of models which appears to be of interest for applications. Chapter 12 presents 
applications to finance. We first study the link between GARCH and diffusion processes, when 
the time step between two observations converges to zero. Two applications to finance are then 
presented: risk measurement and the pricing of derivatives. 

Appendix A includes the probabilistic properties which are of most importance for the study 
of GARCH models. Appendix B contains results on autocorrelations and partial autocorrelations. 
Appendix C provides solutions to the end-of-chapter exercises. Finally, a set of problems and (in 
most cases) their solutions are provided in Appendix D. 

For more information, please visit the author’s website http://perso.univ-lille3.fr/~cfrancq/ 
Christian-Francq/book-GARCH.html. 


Notation 


General notation 


+ 


pe a 


Sets and spaces 


la, b) 


Matrices 
la 
Mpa R) 


Processes 

iid 

iid (0,1) 

(X+) or (Xi )rez 
(€) 

O; 
m) 
Kn 

LorB 

o{X,;s <t} or Xy-1 


Functions 
I(x) 

[x] 

YX, PX 

Px, Px 
Probability 


‘is defined as’ 
max{x, 0}, max{—x, 0} (or min{x, 0} in Chapter 10) 


positive integers, integers, rational numbers, real numbers 


positive real line 
d-dimensional Euclidean space 
complement of the set D c R? 
half-closed interval 


d-dimensional identity matrix 
the set of p x q real matrices 


independent and identically distributed 
iid centered with unit variance 
discrete-time process 

GARCH process 

conditional variance or volatility 
strong white noise with unit variance 
kurtosis coefficient of 1; 

lag operator 

sigma-field generated by the past of X, 


1 if x € A, 0 otherwise 

integer part of x 

autocovariance and autocorrelation functions of (X+) 
sample autocovariance and autocorrelation 


Gaussian law with mean m and covariance matrix © 
chi-square distribution with d degrees of freedom 
quantile of order a of the Xe distribution 


xiv NOTATION 


E 

a.s. 

Un = Op (ün) 
a os b 
Estimation 
J 

(ky — 1)J7! 
Oo 

© 

6 

6, O°, Ôn p, 
g = o? (0) 
62 = 5? (0) 
L = 4 (0) 
L: = 4 (0) 


Varas, COVas 


Some abbreviations 
ES 

FGLS 

OLS 

QML 

RMSE 

SACR 

SACV 

SPAC 

VaR 


convergence in distribution 
almost surely 
Un/Un — 0 in probability 


a equals b up to the stochastic order op (1) 


Fisher information matrix 
asymptotic variance of the QML 
true parameter value 

parameter set 

element of the parameter set 
estimators of 0o 

volatility built with the value 8 

as o7 but with initial values 

—2 log(conditional variance of €r) 
approximation of £+, built with initial values 
asymptotic variance and covariance 


expected shortfall 

feasible generalized least squares 
ordinary least squares 
quasi-maximum likelihood 

root mean square error 

sample autocorrelation 

sample autocovariance 

sample partial autocorrelation 
value at risk 


Classical Time Series Models 
and Financial Series 


The standard time series analysis rests on important concepts such as stationarity, autocorrelation, 
white noise, innovation, and on a central family of models, the autoregressive moving average 
(ARMA) models. We start by recalling their main properties and how they can be used. As we 
shall see, these concepts are insufficient for the analysis of financial time series. In particular, we 
shall introduce the concept of volatility, which is of crucial importance in finance. 

In this chapter, we also present the main stylized facts (unpredictability of returns, volatility 
clustering and hence predictability of squared returns, leptokurticity of the marginal distributions, 
asymmetries, etc.) concerning financial series. 


1.1 Stationary Processes 


Stationarity plays a central part in time series analysis, because it replaces in a natural 
way the hypothesis of independent and identically distributed (iid) observations in standard 
statistics. 

Consider a sequence of real random variables (X;)rez, defined on the same probability 
space. Such a sequence is called a time series, and is an example of a discrete-time stochastic 
process. 

We begin by introducing two standard notions of stationarity. 


Definition 1.1 (Strict stationarity) The process (X,) is said to be strictly stationary if the vec- 
tors (X1, ..., X4) and (Xin, .-., Xen)! have the same joint distribution, for any k € N and 
any h € Z. 


The following notion may seem less demanding, because it only constrains the first two 
moments of the variables X;, but contrary to strict stationarity, it requires the existence of 
such moments. 
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Definition 1.2 (Second-order stationarity) The process (X;) is said to be second-order 
stationary if: 


(i) EX? <œ, Yt €Z; 
(ii) EX;=m, Vt €Z; 
(iii) Cov(X;, Xi+n) = yx (h), Vt, h € Z. 


The function yx(-) (px(-) := yx()/yx(0)) is called the autocovariance function (autocorrelation 


function) of (X;). 


The simplest example of a second-order stationary process is white noise. This process is 
particularly important because it allows more complex stationary processes to be constructed. 


Definition 1.3 (Weak white noise) The process (€,) is called weak white noise if, for some pos- 
itive constant o°: 


(i) Ee, =0, Vt €Z; 
(ii) Ee? =o, YteZ; 
(iii) Cov(é;, €4n) =0, VtheZ,nZX0O. 


Remark 1.1 (Strong white noise) It should be noted that no independence assumption is made 
in the definition of weak white noise. The variables at different dates are only uncorrelated and 
the distinction is particularly crucial for financial time series. It is sometimes necessary to replace 
hypothesis (iii) by the stronger hypothesis 


Gii’) the variables €, and €,;, are independent and identically distributed. 


The process (€+) is then said to be strong white noise. 


Estimating Autocovariances 


The classical time series analysis is centered on the second-order structure of the processes. Gaus- 
sian stationary processes are completely characterized by their mean and their autocovariance 
function. For non-Gaussian processes, the mean and autocovariance give a first idea of the tem- 
poral dependence structure. In practice, these moments are unknown and are estimated from a 
realization of size n of the series, denoted X1, ..., Xn. This step is preliminary to any construction 
of an appropriate model. To estimate y(h), we generally use the sample autocovariance defined, 
for 0 < h <n, by 


n—h 


1 = = 
p(h) = — 2 — X)(Xj4n — X) := P(—h), 
= 


where X = (1/n) Xi Xj; denotes the sample mean. We similarly define the sample autocorrela- 
tion function by 6(h) = ~(h)/P(O) for |h| <n. 

The previous estimators have finite-sample bias but are asymptotically unbiased. There are 
other similar estimators of the autocovariance function with the same asymptotic properties (for 
instance, obtained by replacing 1/n by 1/(n —h)). However, the proposed estimator is to be 
preferred over others because the matrix (y(i — j)) is positive semi-definite (see Brockwell and 
Davis, 1991, p. 221). 

It is of course not recommended to use the sample autocovariances when h is close to n, because 
too few pairs (X;, Xj+n) are available. Box, Jenkins and Reinsel (1994, p. 32) suggest that useful 
estimates of the autocorrelations can only be made if, approximately, n > 50 and h < n/4. 
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It is often of interest to know — for instance, in order to select an appropriate model — if 
some or all the sample autocovariances are significantly different from 0. It is then necessary to 
estimate the covariance structure of those sample autocovariances. We have the following result 
(see Brockwell and Davis, 1991, pp. 222, 226). 


Theorem 1.1 (Bartlett’s formulas for a strong linear process) Let (X;) be a linear process 
satisfying 
CO CO 
X= >D Qj€t-j> 5 lġ;j| < œœ, 
j=-00 j=- 
where (€+) is a sequence of iid variables such that 
E(&)=0, E(e?)=07, E(ef) = keot < o. 


Appropriately normalized, the sample autocovariances and autocorrelations are asymptotically nor- 
mal, with asymptotic variances given by the Bartlett formulas: 


lim nCov{? (h), ?(k)} = 2 yy +k-—h)+y(i+k)y(i— h) 
ae — 3)y (hyy (k) (1.1) 
and 
jim nCov{p(h), AŒ} = È p(i)[2p(h)p(k)p(i) — 2p(h)pi +k) — 2p(k)pli + h) 
ne er —k—h)). (1.2) 


Formula (1.2) still holds under the assumptions 


CO 
Ee? <0, 5D Lil; < 00. 


j==% 

In particular, if X; = €; and Ee? < 00, we have 

A) 

L 
Jn ; + N(O, In). 

pth) 
The assumptions of this theorem are demanding, because they require a strong white noise (€+). An 
extension allowing the strong linearity assumption to be relaxed is proposed in Appendix B.2. For 
many nonlinear processes, in particular the ARCH processes studies in this book, the asymptotic 


covariance of the sample autocovariances can be very different from (1.1) (Exercises 1.6 and 1.8). 
Using the standard Bartlett formula can lead to specification errors (see Chapter 5). 


1.2 ARMA and ARIMA Models 


The aim of time series analysis is to construct a model for the underlying stochastic process. This 
model is then used for analyzing the causal structure of the process or to obtain optimal predictions. 
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The class of ARMA models is the most widely used for the prediction of second-order 
stationary processes. These models can be viewed as a natural consequence of a fundamental 
result due to Wold (1938), which can be stated as follows: any centered, second-order stationary, 
and ‘purely nondeterministic’ process! admits an infinite moving-average representation of the form 


CO 
X= 6+) cei, (1.3) 


i=1 


where (€;) is the linear innovation process of (X,+), that is 
€r = X,— E(X;|Hx(t — 1), (1.4) 


where Hx(t — 1) denotes the Hilbert space generated by the random variables X;~,, X;-2,...- 2 
and E(X;\Hy(t — 1)) denotes the orthogonal projection of X, onto Hx(t — 1). The sequence of 
coefficients (c;) is such that )°; oe < oo. Note that (€;) is a weak white noise. 

Truncating the infinite sum in (1.3), we obtain the process 


q 
Xq) = 6+} cieni, 


i=l 


called a moving average process of order g, or MA(q). We have 


X:(q) — Xi} = Ee? Yc? + 0, as q > ov. 


i>q 


It follows that the set of all finite-order moving averages is dense in the set of second-order 
stationary and purely nondeterministic processes. The class of ARMA models is often preferred 
to the MA models for parsimony reasons, because they generally require fewer parameters. 


Definition 1.4 (ARMA(p, q) process) A second-order stationary process (X,) is called 
ARMA(p, q), where p and q are integers, if there exist real coefficients c, a1, ..., ap, bi, ... , bq 
such that, 


Pp q 
weZ, X+ aiXi=c+e +} bje-j, (14) 
i=! j=l 


where (€+) is the linear innovation process of (X+). 


This definition entails constraints on the zeros of the autoregressive and moving average poly- 
nomials, a(z) = 1 +)°?_yajz' and b(z) = 1+ 74) biz! (Exercise 1.9). The main attraction of 
this model, and the representations obtained by successively inverting the polynomials a(-) and 
b(-), is that it provides a framework for deriving the optimal linear predictions of the process, in 
much simpler way than by only assuming the second-order stationarity. 

Many economic series display trends, making the stationarity assumption unrealistic. Such 
trends often vanish when the series is differentiated, once or several times. Let AX; = X; — 
X,_1 denote the first-difference series, and let A7X, = A(A¢—!X,) (with A°X, = X,) denote the 
differences of order d. 


1A stationary process (X,) is said to be purely nondeterministic if and only if (Ness Hx(n) = {0}, where 
Hx (n) denotes, in the Hilbert space of the real, centered, and square integrable variables, the subspace consti- 
tuted by the limits of the linear combinations of the variables X„—;, i > 0. Thus, for a purely nondeterministic 
(or regular) process, the linear past, sufficiently far away in the past, is of no use in predicting future values. 
See Brockwell and Davis (1991, pp. 187—189) or Azencott and Dacunha-Castelle (1984) for more details. 

2 In this representation, the equivalence class E(X,|Hx(t — 1)) is identified with a random variable. 


CLASSICAL TIME SERIES MODELS AND FINANCIAL SERIES 5 
Definition 1.5 (ARIMA(), d, q) process) Let d be a positive integer. The process (X+) is said to 


be an ARIMA(p, d, q) process if, for k = 0,...,d — 1, the processes (A X,) are not second-order 
stationary, and (A? X;) is an ARMA(p, q) process. 


The simplest ARIMA process is the ARIMA(0, 1, 0), also called the random walk, satisfying 


XH=et+e-it---ter+Xo, ¢2>1, 


where e, is a weak white noise. 

For statistical convenience, ARMA (and ARIMA) models are generally used under stronger 
assumptions on the noise than that of weak white noise. Strong ARMA refers to the ARMA model 
of Definition 1.4 when €; is assumed to be a strong white noise. This additional assumption allows 
us to use convenient statistical tools developed in this framework, but considerably reduces the 
generality of the ARMA class. Indeed, assuming a strong ARMA is tantamount to assuming that 
(i) the optimal predictions of the process are linear ((€;) being the strong innovation of (X;)) and 
(ii) the amplitudes of the prediction intervals depend on the horizon but not on the observations. 
We shall see in the next section how restrictive this assumption can be, in particular for financial 
time series modeling. 

The orders (p,q) of an ARMA process are fully characterized through its autocorrelation 
function (see Brockwell and Davis, 1991, pp. 89—90, for a proof). 


Theorem 1.2 (Characterization of an ARMA process) Let (X;) denote a second-order station- 
ary process. We have 


Pp 
pth) +) aiph-i)=0, forall |h] >q, 


i=l 
if and only if (X,) is an ARMA(p, q) process. 
To close this section, we summarize the method for time series analysis proposed in the famous 


book by Box and Jenkins (1970). To simplify presentation, we do not consider seasonal series, for 
which SARIMA models can be considered. 


Box-Jenkins Methodology 


The aim of this methodology is to find the most appropriate ARIMA(p, d, q) model and to use it 
for forecasting. It uses an iterative six-stage scheme: 


(i) a priori identification of the differentiation order d (or choice of another transformation); 
Gi) a priori identification of the orders p and q; 
Gii) estimation of the parameters (a1, ..., ap, b1,..., bq and o? = Var €t); 
(iv) validation; 
(v) choice of a model; 
(vi) prediction. 


Although many unit root tests have been introduced in the last 30 years, step (i) is still essentially 
based on examining the graph of the series. If the data exhibit apparent deviations from stationarity, 
it will not be appropriate to choose d = 0. For instance, if the amplitude of the variations tends 
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Index value 
2000 3000 4000 5000 6000 7000 


19/Aug/91 11/Sep/01 21/Jan/08 


Figure 1.1 CAC 40 index for the period from March 1, 1990 to October 15, 2008 (4702 
observations). 


to increase, the assumption of constant variance can be questioned. This may be an indication 
that the underlying process is heteroscedastic.? If a regular linear trend is observed, positive or 
negative, it can be assumed that the underlying process is such that EX, = at + b with a Æ 0. If 
this assumption is correct, the first-difference series AX; = X; — X;—-, should not show any trend 
(EAX, =a) and could be stationary. If no other sign of nonstationarity can be detected (such 
as heteroscedasticity), the choice d = | seems suitable. The random walk (whose sample paths 
may resemble the graph of Figure 1.1), is another example where d = 1 is required, although this 
process does not have any deterministic trend. 

Step (ii) is more problematic. The primary tool is the sample autocorrelation function. If, 
for instance, we observe that 6(1) is far away from O but that for any h>1, 6(h) is close to 
0,4 then, from Theorem 1.1, it is plausible that p(1) £0 and p(h) = 0 for all h>1. In this 
case, Theorem 1.2 entails that X, is an MA(1) process. To identify AR processes, the partial 
autocorrelation function (see Appendix B.1) plays an analogous role. For mixed models (that is, 
ARMA(p, q) with pg 4 0), more sophisticated statistics can be used, as will be seen in Chapter 5. 
Step (ii) often results in the selection of several candidates (p;,q1),..-, (Pk, 4k) for the ARMA 
orders. These k models are estimated in step (iii), using, for instance, the least-squares method. 
The aim of step (iv) is to gauge if the estimated models are reasonably compatible with the data. 
An important part of the procedure is to examine the residuals which, if the model is satisfactory, 
should have the appearance of white noise. The correlograms are examined and portmanteau tests 
are used to decide if the residuals are sufficiently close to white noise. These tools will be described 
in detail in Chapter 5. When the tests on the residuals fail to reject the model, the significance of 
the estimated coefficients is studied. Testing the nullity of coefficients sometimes allows the model 
to be simplified. This step may lead to rejection of all the estimated models, or to consideration 
of other models, in which case we are brought back to step (i) or (ii). If several models pass the 
validation step (iv), selection criteria can be used, the most popular being the Akaike (AIC) and 
Bayesian (BIC) information criteria. Complementing these criteria, the predictive properties of the 
models can be considered: different models can lead to almost equivalent predictive formulas. The 
parsimony principle would thus lead us to choose the simplest model, the one with the fewest 
parameters. Other considerations can also come into play: for instance, models frequently involve 
a lagged variable at the order 12 for monthly data, but this would seem less natural for weekly data. 


3 In contrast, a process such that VarX, is constant is called (marginally) homoscedastic. 
+ More precisely, for h > 1, /n|A(h)|/V1 + 262(1) is a plausible realization of the |M(0, 1)| distribution. 
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If the model is appropriate, step (vi) allows us to easily compute the best linear predictions X,(h) 
at horizon h = 1,2,.... Recall that these linear predictions do not necessarily lead to minimal 
quadratic errors. Nonlinear models, or nonparametric methods, sometimes produce more accurate 
predictions. Finally, the interval predictions obtained in step (vi) of the Box—Jenkins methodology 
are based on Gaussian assumptions. Their magnitude does not depend on the data, which for 
financial series is not appropriate, as we shall see. 


1.3 Financial Series 


Modeling financial time series is a complex problem. This complexity is not only due to the variety 
of the series in use (stocks, exchange rates, interest rates, etc.), to the importance of the frequency 
of d’ observation (second, minute, hour, day, etc) or to the availability of very large data sets. It is 
mainly due to the existence of statistical regularities (stylized facts) which are common to a large 
number of financial series and are difficult to reproduce artificially using stochastic models. 

Most of these stylized facts were put forward in a paper by Mandelbrot (1963). Since then, 
they have been documented, and completed, by many empirical studies. They can be observed 
more or less clearly depending on the nature of the series and its frequency. The properties that 
we now present are mainly concerned with daily stock prices. 

Let p; denote the price of an asset at time ¢ and let €, = log(p;/p;—1) be the continuously 
compounded or log return (also simply called the return). The series (€+) is often close to the series 
of relative price variations r; = (p — pPr—1)/Pi—-1, Since €, = log(1 + 7;). In contrast to the prices, 
the returns or relative prices do not depend on monetary units which facilitates comparisons between 
assets. The following properties have been amply commented upon in the financial literature. 


(i) Nonstationarity of price series. Samples paths of prices are generally close to a random 
walk without intercept (see the CAC index series* displayed in Figure 1.1). On the other 
hand, sample paths of returns are generally compatible with the second-order stationarity 
assumption. For instance, Figures 1.2 and 1.3 show that the returns of the CAC index 


10 
| 


Return 
0 
| 


I I TT 
19/Aug/91 11/Sep/01 21/Jan/08 


Figure 1.2 CAC 40 returns (March 2, 1990 to October 15, 2008). August 19, 1991, Soviet Putsch 
attempt; September 11, 2001, fall of the Twin Towers; January 21, 2008, effect of the subprime 
mortgage crisis; October 6, 2008, effect of the financial crisis. 


> The CAC 40 index is a linear combination of a selection of 40 shares on the Paris Stock Exchange (CAC 
stands for ‘Cotations Assistées en Continu’). 
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Figure 1.3 Returns of the CAC 40 (January 2, 2008 to October 15, 2008). 


oscillate around zero. The oscillations vary a great deal in magnitude, but are almost constant 
in average over long subperiods. The recent extreme volatility of prices, induced by the 
financial crisis of 2008, is worth noting. 


(ii) Absence of autocorrelation for the price variations. The series of price variations generally 
displays small autocorrelations, making it close to a white noise. This is illustrated for the 


Autocorrelation 


Autocorrelation 


Figure 1.4 Sample autocorrelations of (a) returns and (b) squared returns of the CAC 40 
(January 2, 2008 to October 15, 2008). 
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CAC in Figure 1.4(a). The classical significance bands are used here, as an approximation, 
but we shall see in Chapter 5 that they must be corrected when the noise is not independent. 
Note that for intraday series, with very small time intervals between observations (measured 
in minutes or seconds) significant autocorrelations can be observed due to the so-called 
microstructure effects. 


Gii) Autocorrelations of the squared price returns. Squared returns (e?) or absolute returns (|é;|) 
are generally strongly autocorrelated (see Figure 1.4(b)). This property is not incompatible 
with the white noise assumption for the returns, but shows that the white noise is not strong. 


(iv) Volatility clustering. Large absolute returns |e;| tend to appear in clusters. This property is 
generally visible on the sample paths (as in Figure 1.3). Turbulent (high-volatility) subperiods 
are followed by quiet (low-volatility) periods. These subperiods are recurrent but do not 
appear in a periodic way (which might contradict the stationarity assumption). In other 
words, volatility clustering is not incompatible with a homoscedastic (i.e. with a constant 
variance) marginal distribution for the returns. 


(v) Fat-tailed distributions. When the empirical distribution of daily returns is drawn, one can 
generally observe that it does not resemble a Gaussian distribution. Classical tests typically 
lead to rejection of the normality assumption at any reasonable level. More precisely, the 
densities have fat tails (decreasing to zero more slowly than exp(—x?/2)) and are sharply 
peaked at zero: they are called leptokurtic. A measure of the leptokurticity is the kurtosis 
coefficient, defined as the ratio of the sample fourth-order moment to the squared sample 
variance. Asymptotically equal to 3 for Gaussian iid observations, this coefficient is much 
greater than 3 for returns series. When the time interval over which the returns are com- 
puted increases, leptokurticity tends to vanish and the empirical distributions get closer to a 
Gaussian. Monthly returns, for instance, defined as the sum of daily returns over the month, 
have a distribution that is much closer to the normal than daily returns. Figure 1.5 compares 


0.0 
| 


I I I I I 
-10 -5 0 5 10 


Figure 1.5 Kernel estimator of the CAC 40 returns density (solid line) and density of a Gaussian 
with mean and variance equal to the sample mean and variance of the returns (dotted line). 
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Table 1.1 Sample autocorrelations of returns €, (CAC 40 index, January 2, 2008 to October 15, 


2008), of absolute returns |€;|, sample correlations between en , and |e;|, and between —e,_,, and 
lezl. 

h 1 2 3 4 5 6 7 
fe(h) —0.012 —0.014 —0.047 0.025 —0.043 —0.023 —0.014 
Ple\(h) 0.175 0.229 0.235 0.200 0.218 0.212 0.203 
bet, aap) 0.038 0.059 0.051 0.055 0.059 0.109 0.061 
P(-€,_,> lel) 0.160 0.200 0.215 0.173 0.190 0.136 0.173 
We use here the notation ef = max(e;,0) and €; = min(é;, 0). 


a kernel estimator of the density of the CAC returns with a Gaussian density. The peak 
around zero appears clearly, but the thickness of the tails is more difficult to visualize. 


(vi) Leverage effects. The so-called leverage effect was noted by Black (1976), and involves 
an asymmetry of the impact of past positive and negative values on the current volatility. 
Negative returns (corresponding to price decreases) tend to increase volatility by a larger 
amount than positive returns (price increases) of the same magnitude. Empirically, a positive 
correlation is often detected between er = max(é€;, 0) and |€;+,| (a price increase should 
entail future volatility increases), but, as shown in Table 1.1, this correlation is generally 
less than between —e, = max(—e;, 0) and |e;+p|. 


(vii) Seasonality. Calendar effects are also worth mentioning. The day of the week, the proximity 
of holidays, among other seasonalities, may have significant effects on returns. Following a 
period of market closure, volatility tends to increase, reflecting the information cumulated 
during this break. However, it can be observed that the increase is less than if the information 
had cumulated at constant speed. Let us also mention that the seasonal effect is also very 
present for intraday series. 


1.4 Random Variance Models 


The previous properties illustrate the difficulty of financial series modeling. Any satisfactory sta- 
tistical model for daily returns must be able to capture the main stylized facts described in the 
previous section. Of particular importance are the leptokurticity, the unpredictability of returns, and 
the existence of positive autocorrelations in the squared and absolute returns. Classical formulations 
(such as ARMA models) centered on the second-order structure are inappropriate. Indeed, the 
second-order structure of most financial time series is close to that of white noise. 

The fact that large absolute returns tend to be followed by large absolute returns (whatever 
the sign of the price variations) is hardly compatible with the assumption of constant conditional 
variance. This phenomenon is called conditional heteroscedasticity : 


Var(€; | €;-1, €;-2,---) # const. 


Conditional heteroscedasticity is perfectly compatible with stationarity (in the strict and second- 
order senses), just as the existence of a nonconstant conditional mean is compatible with station- 
arity. The GARCH processes studied in this book will amply illustrate this point. 

The models introduced in the econometric literature to account for the very specific nature 
of financial series (price variations or log-returns, interest rates, etc.) are generally written in the 
multiplicative form 


Et = Ont (1.6) 
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where (7,) and (o+) are real processes such that: 
(i) o, is measurable with respect to a o-field, denoted F;_1; 


(ii) (n+) is an iid centered process with unit variance, n; being independent of F,—ı and o (€u; 
u<t); 


(iii) o > 0. 


This formulation implies that the sign of the current price variation (that is, the sign of €;) is that 
of 7,, and is independent of past price variations. Moreover, if the first two conditional moments 
of €, exist, they are given by 


E@|Fin)=0, Ele | F1) =0?. 


The random variable o; is called the volatility ® of €. 
It may also be noted that (under existence assumptions) 


E(é;) = E(o,)E(m) = 0 


and 
Cov(e;, €:—n) = E (4) E (0t€r-n) = 0, Wh>0, 


which makes (e) a weak white noise. The series of squares, on the other hand, generally have 
nonzero autocovariances: (€+) is thus not a strong white noise. 
The kurtosis coefficient of €,, if it exists, is related to that of 7;, denoted x}, by 


E(ef) E f we 
; = 


ee 1.7 
[E (eA) {E(o?)}? ai 


This formula shows that the leptokurticity of financial time series can be taken into account in two 
different ways: either by using a leptokurtic distribution for the iid sequence (n+), or by specifying 
a process (67) with a great variability. 

Different classes of models can be distinguished depending on the specification adopted for o;: 


(i) Conditionally heteroscedastic (or GARCH-type) processes for which F;~1 = o (€s; 5 < t) is 
the o-field generated by the past of €+. The volatility is here a deterministic function of the 
past of €,. Processes of this class differ by the choice of a specification for this function. 
The standard GARCH models are characterized by a volatility specified as a linear function 
of the past values of A They will be studied in detail in Chapter 2. 


(ii) Stochastic volatility processes’ for which F,—; is the o-field generated by {v;, v;-1,...}; 
where (v+) is a strong white noise and is independent of (n+). In these models, volatility is a 
latent process. The most popular model in this class assumes that the process log o; follows 
an AR(1) of the form 

logo, = w + ġ log o1 + vr, 


where the noises (v) and (7,) are independent. 


Gii) Switching-regime models for which o, = o (A;, F;~1), where (A;) is a latent (unobservable) 
integer-valued process, independent of (n+). The state of the variable A; is here interpreted 
as a regime and, conditionally on this state, the volatility of €; has a GARCH specification. 
The process (^+) is generally supposed to be a finite-state Markov chain. The models are 
thus called Markov-switching models. 


6 There is no general agreement concerning the definition of this concept in the literature. Volatility some- 
times refers to a conditional standard deviation, and sometimes to a conditional variance. 
7 Note, however, that the volatility is also a random variable in GARCH-type processes. 
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1.5 Bibliographical Notes 


The time series concepts presented in this chapter are the subject of numerous books. Two classical 
references are Brockwell and Davis (1991) and Gouriéroux and Monfort (1995, 1996). 

The assumption of iid Gaussian price variations has long been predominant in the finance 
literature and goes back to the dissertation by Bachelier (1900), where a precursor of Brownian 
motion can be found. This thesis, ignored for a long time until its rediscovery by Kolmogorov in 
1931 (see Kahane, 1998), constitutes the historical source of the link between Brownian motion 
and mathematical finance. Nonetheless, it relies on only a rough description of the behavior of 
financial series. The stylized facts concerning these series can be attributed to Mandelbrot (1963) 
and Fama (1965). Based on the analysis of many stock returns series, their studies showed the 
leptokurticity, hence the non-Gaussianity, of marginal distributions, some temporal dependencies 
and nonconstant volatilities. Since then, many empirical studies have confirmed these findings. 
See, for instance, Taylor (2007) for a recent presentation of the stylized facts of financial times 
series. In particular, the calendar effects are discussed in detail. 

As noted by Shephard (2005), a precursor article on ARCH models is that of Rosenberg (1972). 
This article shows that the decomposition (1.6) allows the leptokurticity of financial series to be 
reproduced. It also proposes some volatility specifications which anticipate both the GARCH and 
stochastic volatility models. However, the GARCH models to be studied in the next chapters are 
not discussed in this article. The decomposition of the kurtosis coefficient in (1.7) can be found in 
Clark (1973). 

A number of surveys have been devoted to GARCH models. See, among others, Boller- 
slev, Chou and Kroner (1992), Bollerslev, Engle and Nelson (1994), Pagan (1996), Palm (1996), 
Shephard (1996), Kim, Shephard, and Chib (1998), Engle (2001, 2002b, 2004), Engle and Pat- 
ton (2001), Diebold (2004), Bauwens, Laurent and Rombouts (2006) and Giraitis et al. (2006). 
Moreover, the books by Gouriéroux (1997) and Xekalaki and Degiannakis (2009) are devoted to 
GARCH and several books devote a chapter to GARCH: Mills (1993), Hamilton (1994), Franses 
and van Dijk (2000), Gouriéroux and Jasiak (2001), Tsay (2002), Franke, Härdle and Hafner 
(2004), McNeil, Frey and Embrechts (2005), Taylor (2007) and Andersen et al. (2009). See also 
Mikosch (2001). 

Although the focus of this book is on financial applications, it is worth mentioning that GARCH 
models have been used in other areas. Time series exhibiting GARCH-type behavior have also 
appeared, for example, in speech signals (Cohen, 2004; Cohen, 2006; Abramson and Cohen, 
2008), daily and monthly temperature measurements (Tol, 1996; Campbell and Diebold, 2005; 
Romilly, 2005; Huang, Shiu, and Lin, 2008), wind speeds (Ewing, Kruse, and Schroeder, 2006), 
and atmospheric CO2 concentrations (Hoti, McAleer, and Chan, 2005; McAleer and Chan, 2006). 

Most econometric software (for instance, GAUSS, R, RATS, SAS and SPSS) incorporates 
routines that permit the estimation of GARCH models. Readers interested in the implementation 
with Ox may refer to Laurent (2009). 

Stochastic volatility models are not treated in this book. One may refer to the book by Taylor 
(2007), and to the references therein. For switching regimes models, two recent references are the 
monographs by Cappé, Moulines and Rydén (2005), and by Friihwirth-Schnatter (2006). 


1.6 Exercises 


1.1 (Stationarity, ARMA models, white noises) 
Let (7;) denote an iid centered sequence with unit variance (and if necessary with a finite 
fourth-order moment). 


1. Do the following models admit a stationary solution? If yes, derive the expectation and 
the autocorrelation function of this solution. 


1.2 


1.3 


1.4 


1.5 


1.6 


1.7 


1.8 
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(a) X, =1+0.5X;-1 + m; 
(b) X; = 1 +2X1 + m5 


(c) X; = 1 +0.5X;1 + 7 — 0.4n-1. 


2. Identify the ARMA models compatible with the following recursive relations, where p(-) 
denotes the autocorrelation function of some stationary process: 


(a) p(h) = 0.4p(A — 1), for all h > 2; 
(b) p(h) = 0, for all h > 3; 
(c) p(h) = 0.2p(h — 2), for all h > 1. 


3. Verify that the following processes are white noises and decide if they are weak or strong. 


(@) e =n- l; 
(b) € = nm-1; 
(A property of the sum of the sample autocorrelations) 
Let 
n—h 
Ph) = Ph) = — X- Xn) Xn Xn), h=0,...,n— 1, 
a= 
denote the sample autocovariances of real observations X),..., X,. Set A(h) = p(—h) = 


~(h)/P(O) for h = 0,...,n — 1. Show that 


n-1 1 
2 óh) = -5 


h=1 


(It is impossible to decide whether a process is stationary from a path) 

Show that the sequence { CD Fo ,_ can be a realization of a nonstationary process. Show 
that it can also be a realization of a stationary process. Comment on the consequences of this 
result. 


(Stationarity and ergodicity from a path) 
Can the sequence 0, 1,0, 1,... be a realization of a stationary process or of a stationary and 
ergodic process? The definition of ergodicity can be found in Appendix A.1. 


(A weak white noise which is not semi-strong) 
Let (n+) denote an iid V(0, 1) sequence and let k be a positive integer. Set €, = mm—1 --- M—k- 
Show that (€;) is a weak white noise, but is not a strong white noise. 


(Asymptotic variance of sample autocorrelations of a weak white noise) 

Consider the white noise €, of Exercise 1.5. Compute lim,... Var 6(h) where h 4 0 and 
P(-) denotes the sample autocorrelation function of €1,...,€,. Compare this asymptotic 
variance with that obtained from the usual Bartlett formula. 


(ARMA representation of the square of a weak white noise) 
Consider the white noise €, of Exercise 1.5. Show that €? follows an ARMA process. Make 
the ARMA representation explicit when k = 1. 


(Asymptotic variance of sample autocorrelations of a weak white noise) 
Repeat Exercise 1.6 for the weak white noise €r = nr/n:-k, where (n+) is an iid sequence 
such that Eni < œ and Eny? < œ, and k is a positive integer. 
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Figure 1.6 Sample autocorrelations (h) (h = 1,...,36) of (a) the S&P 500 index from 
January 3, 1979 to December 30, 2001, and (b) the squared index. The interval between the 
dashed lines (+1.96/,/n, where n = 5804 is the sample length) should contain approximately 
95% of a strong white noise. 


1.9 


1.10 


1.11 


(Stationary solutions of an AR(1)) 
Let (7:)rez be an iid centered sequence with variance o° >0, and let a + 0. Consider the 
AR(1) equation 

X,-aX;,; =m, teZ. (1.8) 


1. Show that for |a| < 1, the infinite sum 


CO 
X, = DD a‘ mk 
k=0 


converges in quadratic mean and almost surely, and that it is the unique stationary solution 
of (1.8). 


2. For |a| = 1, show that no stationary solution exists. 
3. For |a| > 1, show that 
dl 


X=- > ge +k 


k=1 
is the unique stationary solution of (1.8). 


4. For |a| > 1, show that the causal representation 


1 
Xı— -X1 = 6, teZ, (1.9) 
a 


holds, where (e€+)rez is a white noise. 


(Is the S&P 500 a white noise?) 

Figure 1.6 displays the correlogram of the S&P 500 returns from January 3, 1979 to 
December 30, 2001, as well as the correlogram of the squared returns. Is it reasonable to 
think that this index is a strong white noise or a weak white noise? 


(Asymptotic covariance of sample autocovariances) 
Justify the equivalence between (B.18) and (B.14) in the proof of the generalized Bartlett 
formula of Appendix B.2. 


1.12 


1.13 


1.14 


1.15 


CLASSICAL TIME SERIES MODELS AND FINANCIAL SERIES 15 


(Asymptotic independence between the p(h) for a noise) 

Simplify the generalized Bartlett formulas (B.14) and (B.15) when X = € is a pure white 
noise. 

In an autocorrelogram, consider the random number M of sample autocorrelations falling 
outside the significance region (at the level 95%, say), among the first m autocorrelations. 
How can the previous result be used to evaluate the variance of this number when the observed 
process is a white noise (satisfying the assumptions allowing (B.15) to be used)? 


(An incorrect interpretation of autocorrelograms) 

Some practitioners tend to be satisfied with an estimated model only if all sample autocorre- 
lations fall within the 95% significance bands. Show, using Exercise 1.12, that based on 20 
autocorrelations, say, this approach leads to wrongly rejecting a white noise with a very high 
probability. 


(Computation of partial autocorrelations) 
Use the algorithm in (B.7) — (B.9) to compute ry (1), rx (2) and rx (3) as a function of px (1), 
px (2) and px (3). 


(Empirical application) 

Download from http://fr.biz.yahoo.com//bourse/accueil.html for instance, a 
stock index such as the CAC 40. Draw the series of closing prices, the series of returns, the 
autocorrelation function of the returns, and that of the squared returns. Comment on these 
graphs. 


Part I 
Univariate GARCH Models 


GARCH(p, q) Processes 


Autoregressive conditionally heteroscedastic (ARCH) models were introduced by Engle (1982) 
and their GARCH (generalized ARCH) extension is due to Bollerslev (1986). In these models, 
the key concept is the conditional variance, that is, the variance conditional on the past. In the 
classical GARCH models, the conditional variance is expressed as a linear function of the squared 
past values of the series. This particular specification is able to capture the main stylized facts 
characterizing financial series, as described in Chapter 1. At the same time, it is simple enough to 
allow for a complete study of the solutions. The ‘linear’ structure of these models can be displayed 
through several representations that will be studied in this chapter. 

We first present definitions and representations of GARCH models. Then we establish the 
strict and second-order stationarity conditions. Starting with the first-order GARCH model, for 
which the proofs are easier and the results are more explicit, we extend the study to the general 
case. We also study the so-called ARCH(oo) models, which allow for a slower decay of squared- 
return autocorrelations. Then, we consider the existence of moments and the properties of the 
autocorrelation structure. We conclude this chapter by examining forecasting issues. 


2.1 Definitions and Representations 
We start with a definition of GARCH processes based on the first two conditional moments. 
Definition 2.1 (GARCH(p, q) process) A process (€,) is called a GARCH(p, q) process if its 
first two conditional moments exist and satisfy: 

(i) Ef | €u, u <t)=0, tez. 


(ii) There exist constants w, a;,i = 1, ...,q and B;, j =1,..., p such that 
q P 
of = Var(e | eu u <t)=0+) ae, +> Bjo j tez. (2.1) 
i=l j=l 
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Equation (2.1) can be written in a more compact way as 
o? = w +a(B)e? + B(B)o7, tez, (2.2) 


where B is the standard backshift operator (B' e? = e? and Bi o? = oZ, for any integer i), and 


a and f are polynomials of degrees q and p, respectively: 


q P 
a(B) = > aiB, B(B) = yb 2". 


i=l j=l 


If B(z) = 0 we have 


q 
of =at+ Yo aic (2.3) 

i=l 
and the process is called an ARCH(q) process.' By definition, the innovation of the process er 


is the variable v, = a — Oa Substituting in (2.1) the variables Of; by Gj —v;-;, we get the 


representation 


r P 
e =o) it Bie itu- Y Piv teZ, (2.4) 
j=l 


i=1 


where r = max(p, q), with the convention a; = 0 (8; = 0) if i >q (j > p). This equation has the 
linear structure of an ARMA model, allowing for simple computation of the linear predictions. 
Under additional assumptions (implying the second-order stationarity of €?), we can state that 
if (€;) is GARCH(p, q), then (e?) is an ARMA(r, p) process. In particular, the square of an 
ARCH(q) process admits, if it is stationary, an AR(q) representation. The ARMA representation 
will be useful for the estimation and identification of GARCH processes.? 


Remark 2.1 (Correlation of the squares of a GARCH) We observed in Chapter 2 that a char- 
acteristic feature of financial series is that squared returns are autocorrelated, while returns are 
not. The representation (2.4) shows that GARCH processes are able to capture this empirical fact. 
If the fourth-order moment of (e+) is finite, the sequence of the h-order autocorrelations of e 
is the solution of a recursive equation which is characteristic of ARMA models. For the sake of 
simplicity, consider the GARCH(1, 1) case. The squared process (e?) is ARMA(1, 1), and thus its 
autocorrelation decreases to zero proportionally to («1 + 61)": for h > 1, 


Corr(e?, €?_,) = K(a1 + Bi)", 


where K is a constant independent of h. Moreover, the €,’s are uncorrelated in view of (i) in 
Definition 2.1. 


Definition 2.1 does not directly provide a solution process satisfying those conditions. The 
next definition is more restrictive but allows explicit solutions to be obtained. The link between 
the two definitions will be given in Remark 2.5. Let 7 denote a probability distribution with null 
expectation and unit variance. 


' This specification quickly turned out to be too restrictive when applied to financial series. Indeed, a large 
number of past variables have to be included in the conditional variance to obtain a good model fit. Choosing 
a large value for q is not satisfactory from a statistical point of view because it requires a large number of 
coefficients to be estimated. 

2 Tt cannot be used to study the existence of stationary solutions, however, because the process (v;) is not 
an iid process. 
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Definition 2.2 (Strong GARCH(p, q) process) Let (7;) be an iid sequence with distribution n. 

The process (€;) is called a strong GARCH(p, q) (with respect to the sequence (n;)) if 
Et = Ort 

| op =w+} 1 ae? tÈ- 1 Bie, (2.5) 


where the a; and pj; are nonnegative constants and w is a (strictly) positive constant. 


GARCH processes in the sense of Definition 2.1 are sometimes called semi-strong following the 
paper by Drost and Nijman (1993) on temporal aggregation. Substituting €,;_; by o;—;7~; in (2.1), 


we get 
q p 
3 ad? 09 30 
Oo, =0+ ` Qio; iNi + X bjo; 
i=l j=l 
which can be written as 


of =ot+) aMi, (2.6) 


i=l 


where a; (z) = œiz? + Bi, i=1,...,r. This representation shows that the volatility process of a 
strong GARCH is the solution of an autoregressive equation with random coefficients. 


Properties of Simulated Paths 


Contrary to standard time series models (ARMA), the GARCH structure allows the magnitude of 
the noise €; to be a function of its past values. Thus, periods with high volatility level (corresponding 
to large values of e? _;) will be followed by periods where the fluctuations have a smaller amplitude. 
Figures 2.1-2.7 illustrate the volatility clustering for simulated GARCH models. Large absolute 
values are not uniformly distributed on the whole period, but tend to cluster. We will see that all 
these trajectories correspond to strictly stationary processes which, except for the ARCH(1) models 
of Figures 2.3—2.5, are also second-order stationary. Even if the absolute values can be extremely 
large, these processes are not explosive, as can be seen from these figures. Higher values of a 
(theoretically œ > 3.56 for the V(0, 1) distribution, as will be established below) lead to explosive 
paths. Figures 2.6 and 2.7, corresponding to GARCH(1, 1) models, have been obtained with the 
same simulated sequence (7,). As we will see, permuting œ and 6 does not modify the variance 
of the process but has an effect on the higher-order moments. For instance the simulated process 


N 
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Figure 2.1 Simulation of size 500 of the ARCH(1) process with œ = 1, a = 0.5 and ņ, ~ 
NOO, 1). 
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Figure 2.2 Simulation of size 500 of the ARCH(1) process with œ = 1, œ = 0.95 and n; ~ 
NO, 1). 


Figure 2.3 Simulation of size 500 of the ARCH(1) process with œ = 1, œ = 1.1 and n; ~ 
N(O, 1). 
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Figure 2.4 Simulation of size 200 of the ARCH(1) process with œ = 1, a = 3 and n, ~ M(0, 1). 
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Figure 2.5 Observations 100-140 of Figure 2.4. 
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Figure 2.6 Simulation of size 500 of the GARCH(1, 1) process with w = 1, œ = 0.2, 8 = 0.7 
and n; ~ MOO, 1). 
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Figure 2.7 Simulation of size 500 of the GARCH(1, 1) process with w = 1, a = 0.7, B = 0.2 
and 7; ~ MOO, 1). 
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of Figure 2.7, with a = 0.7 and 6 = 0.2, does not admit a fourth-order moment, in contrast to the 
process of Figure 2.6. This is reflected by the presence of larger absolute values in Figure 2.7. 
The two processes are also different in terms of persistence of shocks: when f approaches 1, a 
shock on the volatility has a persistent effect. On the other hand, when a is large, sudden volatility 
variations can be observed in response to shocks. 


2.2 Stationarity Study 


This section is concerned with the existence of stationary solutions (in the strict and second-order 
senses) to model (2.5). We are mainly interested in nonanticipative solutions, that is, processes 
(€+) such that e, is a measurable function of the variables 7;-;, s > 0. For such processes, or 
is independent of the o-field generated by {7;4;,, h > O} and e; is independent of the o-field 
generated by {7,4,, h >O}. It will be seen that such solutions are also ergodic. The concept of 
ergodicity is discussed in Appendix A.1. We first consider the GARCH(1, 1) model, which can be 
studied in a more explicit way than the general case. For x > 0, let log x = max(log x, 0). 


2.2.1 The GARCH(1, 1) Case 
When p = q = 1, model (2.5) has the form 


€& = Ot, (n+) iid (0, 1), 


2.7 
g =otae? ,+ fo? ,, tam 
with w > 0, œ > 0, B > 0. Let a(z) = az? + B. 
Theorem 2.1 (Strict stationarity of the strong GARCH(1, 1) process) Zf 
—oo < y := Elog{an? +6} < 0, (2.8) 
then the infinite sum 
CO 
hy = \ 1+) am). am) ¢ (2.9) 


i=l 


converges almost surely (a.s.) and the process (€;) defined by é; = hy is the unique strictly 
stationary solution of model (2.7). This solution is nonanticipative and ergodic. If y > 0 and w > 0, 
there exists no strictly stationary solution. 


Remark 2.2 (On the strict stationarity condition (2.8)) 


1. When œ = 0 and y < 0, it is clear that, in view of (2.9), the unique strictly stationary 
solution is €; = 0. It is therefore natural to impose w > 0. 


2. It may be noted that the condition (2.8) depends on the distribution of 7, and that it is not 
symmetric in a and £. 


3. Examination of the proof below shows that the assumptions En; = 0 and En? = 
which facilitate the interpretation of the model, are not necessary. It is sufficient to have 
E logt n? < oo. 
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4. Condition (2.8) implies 6 < 1. Now, if 
a+ Bp <1, 


then (2.8) is satisfied since, by application of the Jensen inequality, 


E log{a(nr)} < log E{a(n,)} = log(a + £) < 0. 


5. If (2.8) is satisfied, it is also satisfied for any pair (œ1, 1) such that a; < œ and f; < £. In 
particular, the strict stationarity of a given GARCH(1, 1) model implies that the ARCH(1) 
model obtained by canceling £ is also stationary. 


6. In the ARCH(1) case (6 = 0), the strict stationarity constraint is written as 
0 <a < exp{—E (log n^). (2.10) 


For instance when n, ~ \(0, 1) the condition becomes œ < 3.56. For a distribution such 
that EF (log n2) = —oo, for instance with a mass at 0, condition (2.10) is always satisfied. For 
such distributions, a strictly stationary ARCH(1) solution exists whatever the value of a. 


Proof of Theorem 2.1. Note that the coefficient y = E log{a(n;)} always exists in [—oo, +00) 
because E log" {a(n;)} < Ea(n;) = a+ B. Using iteratively the second equation in model (2.7), 
we get, for N > 1, 


a =o-+ a(m—1)07. 1 


N 
=o |: F Xan) . anmo) + a(n-1)- RAO? N] 


n=1 


:= hi (N) + a(m-1)..-@(M—-n-1)67_y-1- (2.11) 


The limit process h; = limy o h;(N) exists in R = [0, +00] since the summands are nonneg- 
ative. Moreover, letting N go to infinity in h;(N) = œ + a(nņı-1)hı-1 (N — 1), we get 


h; = œ + a(m—-1)hy-1. 


We now show that h; is almost surely finite if and only if y < 0. 
Suppose that y < 0. We will use the Cauchy rule for series with nonnegative terms.? We have 


1 n 
[a(m:-1) -.-4(m:-n)I'" = exp È C] >e” as. (2.12) 
i=1 


as n — œo, by application of the strong law of large numbers to the iid sequence (log{a(1;)}).4 
The series defined in (2.9) thus converges almost surely in R, by application of the Cauchy rule, 


3 Let Sm be a series with nonnegative terms and let A = Tim al! "Then (i) if A < 1 the series Xan 


converges, (ii) if A > 1 the series = an diverges. 

4 if (X;) is an iid sequence of random variables admitting an expectation, which can be infinite, then 
1 X; X; > EX a.s. This result, which can be found in Billingsley (1995), follows from the strong law of 
large numbers for integrable variables: suppose, for instance, that E(X aD = +00 and let, for any integer m > 0, 
X; =X} if0 < X* < m, and X; = 0 otherwise. Then 1 Ya = 1 XL Ži > EŠ, a.s., by application 
of the strong law of large numbers to the sequence of integrable variables X;. When m goes to infinity, the 
increasing sequence EX, converges to +00, which allows us to conclude that oe 1 xt >was. 
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and the limit process (h,;) takes positive real values. It follows that the process (€,) defined by 
1/2 


CO 
e = vh = {ot Y aM). aio h (2.13) 
i=1 


is strictly stationary and ergodic (see Appendix A.1, Theorem A.1). Moreover, (€+) is a nonantic- 
ipative solution of model (2.7). 

We now prove the uniqueness. Let č = opn; denote another strictly stationary solution. By 
(2.11) we have 


o? =h,(N) + a(n-1)-. .a(qı-N-1)02 y1- 


It follows that 


op — hy = {h (N) — hy} + a(-1) - . -a (-N-1)02 y1: 


The term in brackets on the right-hand side tends to 0 a.s. as N — oo. Moreover, since 
the series defining h, converges a.s., we have a(m;-1)..-@(-n) — 0 with probability 1 as 
n — œ. In addition, the distribution of a. y— 18 independent of N by stationarity. Therefore, 
a(n-1).. Salat Oy — 0 in probability as N —> oo. We have proved that of -h —0 
in probability as N — oo. This term being independent of N, we necessarily have h; = oa? for 
any t, a.s. 

If y > 0, from (2.12) and the Cauchy rule, Da a(n—1)..-a(M—-n) > +00, a.s., as N > œo. 
Hence, if w > 0, hy = +00 a.s. By (2.11), it is clear that oF = +00, a.s. It follows that there exists 
no almost surely finite solution to (2.7). 

For y = 0, we give a proof by contradiction. Suppose there exists a strictly stationary solution 
(Er, a?) of (2.7). We have, for n > 0, 


op = o41+ > \a(n-1)..-a(7-1) 


i=l 
from which we deduce that a(7_1)...a(7_»)@ converges to zero, a.s., as n — ©, or, equivalently, 
that 


Ys loga(ni) +logw— —œ as. asn — ow. (2.14) 


i=1 


By the Chung—Fuchs theorem? we have lim sup )~7_, log a(n;) = +00 with probability 1, which 
contradicts (2.14). 


The next result shows that nonstationary GARCH processes are explosive. 
Corollary 2.1 (Conditions of explosion) For the GARCH(1, 1) model defined by (2.7) for t > 1, 
with initial conditions for €9 and oo, 


y>o => o? > +00, a.s. (t —> œ). 


If, in addition, E| log(n?)| < œ, then 


y>o = €? > +00, a.s. (t —> œ). 


STE XY s Xn is an iid sequence such that EX, = 0 and E|X,|>0 then limsup,_,,, X; Xi = +00 
and lim inf... )-/_, X; = —oo (see, for instance, Chow and Teicher, 1997). 
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Proof. We have 


t—1 
oF > w414) a). aMi) f = oa). a). (2.15) 


i=1 


Hence, 
t—1 


el Fenai 
umni logo; > lim int dog a(n) =y. 
j= 
Thus log o? — oo and or — œ a.s. as y >Q. By the same arguments, 


1 1 1 
lim inf — log e? = lim inf - (log oF + log n>) > y + lim inf — log n? =y 
t>oo t too t t>oo t 


using Exercise 2.11. The conclusion follows. 


Remark 2.3 (On Corollary 2.1) 


1. When y = 0, Kliippelberg, Lindner and Maller (2004) showed that oa? — oo in probability. 


2. Since, by Jensen’s inequality, we have E log(n?) < œ, the restriction E| log(nŻ)| < © 
means E log(77) > —oo. In the ARCH(1) case this restriction vanishes because the condition 
y = Elog an? >0 implies £ log(7?) >—00. 


Theorem 2.2 (Second-order stationarity of the GARCH(1, 1) process) Letw>0.Ifa+ £f > 
1, a nonanticipative and second-order stationary solution to the GARCH(1, 1) model does not exist. 
Ifa +B <1, the process (€+) defined by (2.13) is second-order stationary. More precisely, (€;) is 
a weak white noise. Moreover, there exists no other second-order stationary and nonanticipative 
solution. 


Proof. If (€«;,) is a GARCH(1, 1) process, in the sense of Definition 2.1, which is second-order 
stationary and nonanticipative, we have 


E(e?) = E{E(e? | eu, u < t)} = Eo?) = w + @+4+ B)E(E7,), 


that is, 
(l—a — B)E(€?) = w. 


Hence, we must have a + 6 < 1. In addition, we get E(e?) >0. Conversely, suppose a + £ < 1. 
By Remark 2.2(4), the strict stationarity condition is satisfied. It is thus sufficient to show that 
the strictly stationary solution defined in (2.13) admits a finite variance. The variable h, being an 
increasing limit of positive random variables, the infinite sum and the expectation can be permuted 
to give 


E(€?) = Eh) = Í +Y Elali). a) w 


n= 
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This proves the second-order stationarity of the solution. Moreover, this solution is a white noise 
because E(e;) = E {E (€, | €u, u < t)} = 0 and for all h > 0, 


Cov(ér, €h) = E {en E (€, | €u, u < t)} = 0. 
Let €; = Vi n, denote another second-order and nonanticipative stationary solution. We have 
[hi — hl = a(r-1) - - -a(Nt-n)|ht-n-1 — hepa, 
and then 
E\h; — hy| = E{a(m-1)---4(qr-n)}E|hr—n—1 — hr—n-1| 
= (æ + p)" E|hi-n-1 — Îr-n-1l- 


Notice that the second equality uses the fact that the solutions are nonanticipative. This assumption 
was not necessary to establish the uniqueness of the strictly stationary solution. The expectation of 
|hi-n—1 — ht—n—1| being bounded by E|A;—n—1| + Elf;—n—1|, which is finite and independent of n 
by stationarity, and since (a + 8)” tends to 0 when n — oo, we obtain E|h,; — hy | = 0 and thus 
h, =h, for all t, a.s. 


Figure 2.8 shows the zones of strict and second-order stationarity for the strong GARCH(1, 1) 
model when n; ~ (0, 1). Note that the distribution of n; only matters for the strict stationarity. 
As noted above, the frontier of the strict stationarity zone corresponds to a random walk (for 
the process log(i; — @)). A similar interpretation holds for the second-order stationarity zone. If 
a+ 6 = 1 we have 

hy = œ + h1 + ahn —1). 


Thus, since the last term in this equality is centered and uncorrelated with any variable belonging 
to the past of h—1, the process (h+) is a random walk. The corresponding GARCH process is 
called integrated GARCH (or IGARCH(1, 1)) and will be studied later: it is strictly stationary, has 
an infinite variance, and a conditional variance which is a random walk (with a positive drift). 


2.2.2 The General Case 


In the general case of a strong GARCH(p, q) process, the following vector representation will be 
useful. We have 


Z, = b, + AZ, (2.16) 


By 


0 1 2 3 4 


Oy 


Figure 2.8 Stationarity regions for the GARCH(1, 1) model when n, ~ M(0, 1): 1, second-order 
stationarity; 1 and 2, strict stationarity; 3, nonstationarity. 
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where 3 
on e? 
; 2 i 
b, = b(m) — w € RPT, Z, — al € RTI, 
0 oe 
0 CN 
and 
pA A 
an agn? Bin? Bon 
1 (0) 0 0 0 
0 1 0 0 0 
(0) we A (0) 0 sa 0 0 
Ay = 217 
j æ gai Oq bı ser Bp ( ) 
0 0 1 0 0 
0 0 0 1 0 
0 dse 0 0 0 | 0 


isa (p + q) x (p +q) matrix. In the ARCH(q) case, z, reduces to e and its q — 1 first past values, 
and A; to the upper-left block of the above matrix. Equation (2.16) defines a first-order vector 
autoregressive model, with positive and iid matrix coefficients. The distribution of z, conditional 
on its infinite past coincides with its distribution conditional on z;—ı only, which means that (z,) 
is a Markov process. Model (2.16) is thus called the Markov representation of the GARCH(p, q) 
model. Iterating (2.16) yields 


z =b, +Y AAR. Aree dy ps (2.18) 


provided that the series exists almost surely. Finding conditions ensuring the existence of this series 
is the object of what follows. Notice that the existence of the right-hand vector in (2.18) does not 
ensure that its components are positive. One sufficient condition for 


oo 
b, +) ArAri -Arkib p> 0, as., (2.19) 
k=1 


in the sense that all the components of this vector are strictly positive (but possibly infinite), is 
that 
w>0, > a gQ)y pj =O (j=1,..., p). (2.20) 


This condition is very simple to use but may not be necessary, as we will see in Section 2.3.2. 


Strict Stationarity 


The main tool for studying strict stationarity is the concept of the top Lyapunov exponent. Let A be 
a(p+q) xX (p+ q) matrix. The spectral radius of A, denoted by (A), is defined as the greatest 
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modulus of its eigenvalues. Let ||- || denote any norm on the space of the (p +q) x (p+ 4q) 
matrices. We have the following algebra result: 


1 
lim — log ||A‘|| = log p(A) (2.21) 
too t 
(Exercise 2.3). This property has the following extension to random matrices. 


Theorem 2.3 Let {A;, t € Z} be a strictly stationary and ergodic sequence of random matrices, 
such that E log* || A; || is finite. We have 


_ ad ee | 
lim —E (log ||A;A;_1... Al) = y = inf — E(log||A;A;-1... Aull), (2.22) 
t>oo f teN* t 


y is called the top Lyapunov exponent and exp(y) is called the spectral radius of the sequence of 
matrices {A,,t € Z}. Moreover, 


1 
y = lim as. —log||A;A;_1.-- Aill (2.23) 
too $ 


Remark 2.4 (On the top Lyapunov exponent y) 
1. It is always true that y < E (log ||A1||), with equality in dimension 1. 
2. If A; = A for all t € Z, we have y = log o (A) in view of (2.21). 


3. All norms on a finite-dimensional space being equivalent, it readily follows that y is inde- 
pendent of the norm chosen. 


4. The equivalence between the definitions of y can be shown using Kingman’s subadditive 
ergodic theorem (see Kingman, 1973, Theorem 6). The characterization in (2.23) is particu- 
larly interesting because its allows us to evaluate this coefficient by simulation. Asymptotic 
confidence intervals can also be obtained (see Goldsheid, 1991). 


The following general lemma, which we shall state without proof (see Bougerol and Picard, 1992a, 
Lemma 3.4), is very useful for studying products of random matrices. 


Lemma 2.1 Let {A;,t € Z} be an ergodic and strictly stationary sequence of random matrices 
such that E log? || A; || is finite, endowed with a top Lyapunov exponent y. Then 


Jim a.s. ||Ag...A--||=0 => y<0O. (2.24) 
00 


As for ARMA models, we are mostly interested in the nonanticipative solutions (€+) to model 
(2.5), that is, those for which e; belongs to the o-field generated by {17;, m1, ...}. 


Theorem 2.4 (Strict stationarity of the GARCH(p, q) model) A necessary and sufficient con- 
dition for the existence of a strictly stationary solution to the GARCH(p, q) model (2.5) is that 


y <0, 


where y is the top Lyapunov exponent of the sequence {A,,t € Z} defined by (2.17). When the 
strictly stationary solution exists, it is unique, nonanticipative and ergodic. 


Proof. We shall use the norm defined by ||A|| = >> |aj;|. For convenience, the norm will be 
denoted identically whatever the dimension of A. With this convention, this norm is clearly 
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multiplicative: ||AB|| < ||A||||B|| for all matrices A and B such that AB exists. Observe that 
since the variable ņ, has a finite variance, the components of the matrix A; are integrable. Hence, 
we have 

E log* \|A;|| < EllArll < œœ. 


First suppose y < 0. Then it follows from (2.23) that 
N 
Z,(N) =b, +) ArAr1... Arndy 91 
n=0 


converges almost surely, when N goes to infinity, to some limit Z,. Indeed, using the fact that the 
norm is multiplicative, 


CO 
IZ 0) < WB, I + 5 Ar Ari... Arn ll IB, -nll (2.25) 
=0 
and 
1/n 1/n 1 l 
lAs.. Aral U2 pall" = exp z 8TA: -+ Arall + z 98 Beni 
pt a 


To show that n~! log ||b 
can be applied because 


+—n_ || > 0 a.s. we have used the result proven in Exercise 2.11, which 


E|log llb, „i ll] < logol + E log* ||b,_-»-1l| < logol + Elib, -nill < 00. 


It follows that, by the Cauchy rule, Z, is well defined in (R**)?t4, Let fy 41, denote the (q + 1)th 


component of Z,. Setting €; = we define a solution of model (2.5). This solution is 


y gri > 
nonanticipative since, by (2.18), €; can be expressed as a measurable function of n+, n;-1,.... By 
Theorem A.1, together with the ergodicity (7,), this solution is also strictly stationary and ergodic. 

The proof of the uniqueness parallels the arguments given in the case p = q = 1. Let (e+) 
denote a strictly stationary solution of model (2.5), or equivalently, let (z,) denote a positive and 


strictly stationary solution of (2.16). For all N > 0, 
Zi = Z, (N) + At af -At_NZ,_n_1: 


Then 
lz — 2, < IŁ) —Z + lr... Anll yil 


The first term on the right-hand side tends to 0 as. as N — oo. In addition, because the 
series defining Z, converges a.s., we have ||A,...A;—y|| —> O with probability 1 when n — oo, 
Moreover, the distribution of ||z,_y_,|| is independent of N by stationarity. It follows that 
lA: --- Arn llliz, yll > 0 in probability as N — oo. We have shown that z,—Z,— 0 in 
probability when N — oo. This quantity being independent of N, we necessarily have Z, = z, for 


any f, a.s. 


é Other examples of multiplicative norms are the Euclidean norm, ||A|| = {$ ap? = {Tr(A’A)}!/2, and 


the sup norm defined, for any matrix A of size d x d, by N(A) = sup {||Ax||; x € Rf, [|x|] < 1} where 
lxil = 30 |x;|. A nonmultiplicative norm is N, defined by Nj (A) = max |a;;|. 


32 GARCH MODELS 


Next we establish the necessary part. From Lemma 2.1, it suffices to prove (2.24). We shall 
show that, for 1 <i < p+q, 


lim Ag...A_;e; =0, as., (2.26) 


too 


where e; is the ith element of the canonical base of R?*4. Let (€,) be a strictly stationary solution 
of (2.5) and let (z,) be defined by (2.16). We have, for t > 0, 


> Ao... Arb pi (2.27) 


because the cn of the matrices A;, by and z , are nonnegative.’ It follows that the series 
ro o Ao... A-xb_,_, converges and thus Ag... A_,b_,_, tends almost surely to 0 as k > oo. 
But since Bg = on? ,_1e1 + @€q+1, it follows that Ag... A—,b_;_, can be decomposed into 
two positive terms, and we have 

lim Ao... Akon ge =0, lim Ao... A-k@eq+1ı = 0, as. (2.28) 

k—> œ k—=> œ 
Since w Æ 0, (2.26) holds for i = q + 1. Now we use the equality 

A-keg+i = Bin? per + Bieg+l tegtit1, i=1,..., p, (2.29) 

with ep+q+1 = 0 by convention. For i = 1, this equality gives 


O= lim Ao... A_peg41 = lim Ag... A-k+1€q+2 = 0, 
t>0o k—> o0 


hence (2.26) is true for i = q + 2, and, by induction, it is also true for i = q + j, j=1,...,p 
using (2.29). Moreover, we note that A—geg = Qq n? ei + &qeq+1, Which allows us to see that, 


from (2.28), (2.26) holds for i = q. For the other values of i the conclusion follows from 


2 . 
Apei = N per Pegy Fe T=H=1,...,¢g-1, 


and an ascending recursion. The proof of Theorem 2.4 is now complete. 


Remark 2.5 (On Theorem 2.4 and its proof) 


1. This theorem shows, in particular, that the stationary solution of the strong GARCH model 
is a semi-strong GARCH process, in the sense of Definition 2.1. The converse is not true, 
however (see the example of Section 4.1.1 below). 


2. Bougerol and Picard (1992b) use a more parsimonious vector representation of the 
GARCH(p,q) model, based on the vector z% = (of, ...,0 Or pti €i- paee Ega) E€ 
R?t4+! (Exercise 2.6). However, a drawback of this representation is that it is only defined 


for p > 1 andq > 2. 


7 Here, and in the sequel, we use the notation x > y, meaning that all the components of the vector x are 
greater than, or equal to, those of the vector y. 
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3. An analogous proof uses the following Markov vector representation based on (2.6): 


h, =+ Bh, (2.30) 
with w = (w, 0, ..., 0)! € R”, h, = (0f, ..., 02,41) € R” and 
_ a (m-1) tes ar (mr) 
ga r 


where J,_; is the identity matrix of size r — 1. Note that, unlike the A;, the matrices B, are 
not independent. It is worth noting (Exercise 2.12), however, that 


eT [2 - j] ze,. (2.31) 
t=0 t=0 


The independence of the matrices A; has been explicitly used in the proof of the necessary 
part of the previous theorem, because it was required to apply Lemma 2.1. Moreover, the 
independence of the A, will be crucial to obtain conditions for the moments existence. We 
shall see, however, that the representation (2.30) may be more convenient than (2.16) for 
deriving other properties such as the mixing properties (see Chapter 3). 


4. To verify the condition y < 0, it is sufficient to check that 
E(log ||A;A;—1-.. Ail) < 0 


for some ¢t > 0. 


5. If a GARCH model admits a strictly stationary solution, any other GARCH model obtained 
by replacing the œ; and £; by smaller coefficients will also admit a strictly stationary 
solution. Indeed, the coefficient y of the latter model will be smaller than that of the initial 
model because, with the norm chosen, 0 < A < B implies ||A|| < ||B||. In particular, the 
strict stationarity of a given GARCH process entails that the ARCH process obtained by 
canceling the coefficients £; is also strictly stationary. 


6. We emphasize the fact that any strictly stationary solution of a GARCH model is nonantic- 
ipative. This is an important difference with ARMA models, for which strictly stationary 
solutions depending on both past and future values of the noise exist. 


The following result provides a simple necessary condition for strict stationarity, in three different 
forms. 


Corollary 2.2 (Consequences of strict stationarity) Let y be the top Lyapunov exponent of the 
sequence {A;,t € Z} defined in (2.17). If y < 0, we have the following equivalent properties: 


(a) Èi Bj < 1; 


(b) 1— Biz —-++— pz? =0 => |z|>1; 

(c) p(B) < 1, where B is the submatrix of A, defined by 
Pi Ba. = Bp 
1 © se 0 


pa} 0 1 > 0 
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Proof. Since all the coefficients of the matrices A; are nonnegative, it is clear that y is larger than 
the top Lyapunov exponent of the constant sequence obtained by replacing by 0 the coefficients 
of the first g rows and of the first g columns of the matrices A;. But the matrix obtained in this 
way has the same nonzero eigenvalues as B, and thus has the same spectral radius as B. In view 
of Remark 2.4(2), it can be seen that 


y = log p(B). 
It follows that y < 0 = (c). It is easy to show (by induction on p and by computing the determinant 
with respect to the last column) that, for A Æ 0, 


det(AI, — B) = A? — aP-' By — --» — ABp-1 b= 2B (=), (2.32) 


where B(z) = 1 — piz —--- — Bpz?. The equivalence between (b) and (c) is then straightforward. 
Next we prove that (a) & (b). We have B(0) = 1 and B(1) = 1 — fi Bj. Hence, if Ej Bj=l 
then B(1) < 0 and, by a continuity argument, there exists a root of B in (0, 1]. Thus (b) => (a). 
Conversely, if ae Bj < 1 and B(zo) =0 for a zo of modulus less than or equal to 1, then 
t= wei pa= baa A < a Bjlzol’ < Dai Bj, which is impossible. It follows that 
(a) = (b) and the proof of the corollary is complete. 


We now give two illustrations allowing us to obtain more explicit stationarity conditions than 
in the theorem. 


Example 2.1 (GARCH(1,1)) In the GARCH(1, 1) case, we retrieve the strict stationarity con- 
dition already obtained. The matrix A; is written in this case as 


A; = (n?, 1)/(a1, b1). 
We thus have 


t-1 


ArAr1...A1 =| [Cn + Bi Ar- 
k=1 
It follows that A 


log Ar A1 -Aill = $ > log@ain7_, + Bi) + log ||Arll 
k=1 


and, in view of (2.23) and by the strong law of large numbers, y = E log(a; n? + 1). The necessary 
and sufficient condition for the strict stationarity is then E log(a, n? + B1) < 0, as obtained above. 


Example 2.2 (ARCH(2)) For an ARCH(2) model, the matrix A, takes the form 


na ( oye t ) 


and the stationarity region can be evaluated by simulation. Table 2.1 shows, for different values 
of the coefficients a; and a2, the empirical means and the standard deviations (in parentheses) 
obtained for 1000 simulations of size 1000 of y = an log || A10004999 . . . A1 ||. The 7;’s have been 
simulated from a M(0, 1) distribution. Note that in the ARCH(1) case, simulation provides a good 
approximation of the condition a; < 3.56, which was obtained analytically. Apart from this case, 


there exists no explicit strict stationarity condition in terms of the coefficients a; and a2. 


Figure 2.9, constructed from these simulations, gives a more precise idea of the strict stationarity 
region for an ARCH(2) process. We shall establish in Corollary 2.3 a result showing that any strictly 
stationary GARCH process admits small-order moments. We begin with two lemmas which are of 
independent interest. 
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Table 2.1 Estimations of y obtained from 1000 simulations of size 1000 in the ARCH(2) case. 


a 
a 0.25 0.3 1 1.2 1.7 1.8 3.4 3:3 3.6 
0 - - - —0.049 —0.018 0.010 
(0.071) (0.071) (0.071) 
0.5 0.175 —0.021 0.006 -= -= - 
(0.040) (0.042) (.044) 
1 - = —0.011 0.046 - z 


(0.038) (0.038) 
1.75 —0.015 0.001 - - 
(0.035) (0.032) 


a9 


Figure 2.9 Stationarity regions for the ARCH(2) model: 1, second-order stationarity; 1 and 2, 
strict stationarity; 3, non-stationarity. 


Lemma 2.2 Let X denote an almost surely positive real random variable. If EX" < œo for some 
r >0 and if ElogX < 0, then there exists s > 0 such that EX5 < 1. 


Proof. The moment-generating function of Y = log X is defined by M(u) = Ee” = EX". The 
function M is defined and continuous on [0, r] and we have, for u > 0, 


M(u) — M(O 
MW- MA = few dPy(y) 


u 


with 


uy 


1 
glu, y) = îy when u J 0. 


By Beppo Levi’s theorem, the right derivative of M at 0 is 


faro = E (log X) < 0. 


Since M (0) = 1, there exists s > 0 such that M (s) = EXS < 1. 
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The following result, which is stated for any sequence of positive iid matrices, provides another 
characterization of the strict stationarity of GARCH models. 


Lemma 2.3 Let {A;} be an iid sequence of positive matrices, with top Lyapunov exponent y. Then 
y<0 = As>0, Jko> 1, 6 := E(|AgyAg-1--- Aili’) < 1. 


Proof. Suppose y < 0. Since y = inf, pE (log || A, Ap_1... Aj|l), there exists kg > 1 such that 
E (log || Ako Aky—1 --- A1ll) < 0. Moreover, 


E(|| Ako Ako-1 «+» Arll) = || E (Ako Ako-1 - - -A1 )|l 
= | (EA) |] 
< (El|Ai||) < 00 (2.33) 


using the multiplicative norm || A|| = ee 3 |A(i, j)|, the positivity of the elements of the A;, and 
the independence and equidistribution of the A;. We conclude, concerning the direct implication, 
by using Lemma 2.2. The converse does not use the fact that the sequence is iid. If there exist 
s >Q and kọ > 1 such that ô < 1, we have, by Jensen’s inequality, 


1 1 
y < —E (log || Aky Aky-1--- Aill) < —logd < 0. 
ko sko 


Corollary 2.3 Let y denote the top Lyapunov exponent of the sequence (A,) defined in (2.17). 
Then 


y<0 = Js>0, Eo” < œ, Ee” < 00 


where €; = orm is the strictly stationary solution of the GARCH(p, q) model (2.5). 


Proof. The proof of Lemma 2.3 shows that the real s involved in the two previous lemmas can be 
taken to be less than 1. For s € (0, 1], a, b > 0 we have E) + (4) > 1 and, consequently, 
CO ui)” < J; uj for any sequence of positive numbers u;. Using this inequality, together with 
arguments already used (in particular, the fact that the norm is multiplicative), we deduce that the 


stationary solution defined in (2.18) satisfies 


fore) ko 
Eliz I? < WEB IS} 1+ >) 5* SEAL} F< 00 
k=0 si 


where ô is defined in Lemma 2.3. We conclude by noting that g” < |lz,||* and es < (lz, I. 


Using Lemma 2.3 and Corollary 2.3 together, it can be seen that for s € (0, 1], 
3ko > 1, E (Ak) Ak- Ail?) <1 = Ee” <o. (2.34) 
The converse is generally true. For instance, we have for s € (0, 1], 
a, + 1 >0, Ee <% => jim, E(\|AgAx... Aill) = 0 (2.35) 


(Exercise 2.13). 
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Second-Order Stationarity 


The following theorem gives necessary and sufficient second-order stationarity conditions. 


Theorem 2.5 (Second-order stationarity) Zf there exists a GARCH(p, q) process, in the sense 
of Definition 2.1, which is second-order stationary and nonanticipative, and if œ > 0, then 


q P 
Via + Sof <1. (2.36) 
i=l j=l 


Conversely, if (2.36) holds, the unique strictly stationary solution of model (2.5) is a weak white 
noise (and thus is second-order stationary). In addition, there exists no other second-order station- 
ary solution. 


Proof. We first show that condition (2.36) is necessary. Let (€,) be a second-order stationary and 
nonanticipative GARCH(p, q) process. Then 


E(e7?) = E{E(e | eq, u < t)} = E(o?) 


is a positive real number which does not depend on t. Taking the expectation of both sides of 
(2.1), we thus have 


q P 
Ele) =w) aE Y BE), 


i=l j=l 


that is, 


1— Soa: — bj | Ee?) =o. (2.37) 


Since q is strictly positive, we must have (2.36). 
Now suppose that (2.36) holds true and let us construct a stationary GARCH solution (in the 
sense of Definition 2.2). For t, k € Z, we define IR¢-valued vectors as follows: 


zit) = 2 ifk <0 
| b, +A Zt- 1) ifk>0. 
We have 
0 ifk <0 
Z(t) — Zeit) = į b, ifk =0 


Ar{Zp_-1(t — 1) — Zk-—2(t — 1)} ifk>0. 
By iterating these relations we get, for k > 0, 
Z(t) — Zk-1 (t) = ApAr-1--- Ar—e4 1 By g- 


On the other hand, for the norm ||C|| = X; j |cij|, we have, for any random matrix C with positive 
coefficients, E||C|| = £ ij le) = E vii cij = ||E(C)||. Hence, for k > 0, 


EZO) — Ze-1 || = WE (Ar Ara - - - Are 12,4) I. 


because the matrix A;A;—1...Ar—x41b,;_, is positive. All terms of the product A;A;-|... 
A;-x41b,_, are independent (because the process (n+) is iid and every term of the product is 
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function of a variable n;_;, the dates t — i being distinct). Moreover, A := E(A;) and b = E(b,) 
do not depend on f. Finally, for k > 0, 


E||Zx(t) — Zll = |A*bI| = C1, ..., DAD 


because all terms of the vector A*b are positive. 
Condition (2.36) implies that the modulus of the eigenvalues of A are strictly less than 1. 
Indeed, one can verify (Exercise 2.10) that 


q P 
det(A p4 — A) =P | 1- aa A BA! ] (2.38) 
j=l 


i=l 
Thus if |A| > 1, using the inequality |a — b| > |a| — |b|, we get 


q 


P q Pp 
[detA -Al = |1— aa — 92 BA 4] = 1- Dor — DB > 0, 
j=l i=l j=l 


i=l 


and then (A) < 1. It follows that, in view of the Jordan decomposition, or using (2.21), the con- 
vergence A‘ — 0 holds at exponential rate when k —> oo. Hence, for any fixed t, Z;(t) converges 
both in the L! sense, using Cauchy’s criterion, and almost surely as k — oo. Let z, denote the 
limit of (Z;(t))x. At fixed k, the process (Z;(t));ez is strictly stationary. The limit process (z,) is 
thus strictly stationary. Finally, it is clear that z, is a solution of equation (2.16). 

The uniqueness can be shown as in the case p = q = 1, using the representation (2.30). 


Remark 2.6 (On the second-order stationarity of GARCH) 
1. Under the conditions of Theorem 2.5, the unique stationary solution of model (2.5) is, using 
(2.37), a white noise of variance 
w 
1- Yat- Be 


2. Because the conditions in Theorems 2.4 and 2.5 are necessary and sufficient, we necessarily 


have 
q P 
Xat) 6 <1 >y <0, 
i=l j=l 


since the second-order stationary solution of Theorem 2.5 is also strictly stationary. One 
can directly check this implication by noting that if (2.36) is true, the previous proof shows 
that the spectral radius p(EA,) is strictly less than 1. Moreover, using a result by Kesten 
and Spitzer (1984, (1.4)), we always have 


Var(€;) = 


y < log p(EA,). (2.39) 


IGARCH(p, q) Processes 
When 


q P 
Xai +) 8; =f 
i=l j=l 


the model is called an integrated GARCH(p, q) or IGARCH(p, q) model (see Engle and Boller- 
slev, 1986). This name is justified by the existence of a unit root in the autoregressive part of 
representation (2.4) and is introduced by analogy with the integrated ARMA models, or ARIMA. 
However, this analogy can be misleading: there exists no (strict or second-order) stationary solution 
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of an ARIMA model, whereas an IGARCH model admits a strictly stationary solution under very 
general conditions. In the univariate case (p = q = 1), the latter property is easily shown. 


Corollary 2.4 (Strict stationarity of IGARCH(1, 1)) If P[n?=1] <1 and if a+p=1, 
model (2.7) admits a unique strictly stationary solution. 


Proof. Recall that in the case p = q = 1, the matrices A; can be replaced by a(n) = an? +8. 
Hence, y = E loga(;) < log E{a(n:)} = 0. The inequality is strict unless if a(n,;) is a.s. con- 
stant. Since E{a(n,)} = 1, this constant can only be equal to 1. Thus n? = l a.s., which is 
excluded. 


This property extends to the general case under slightly more restrictive conditions on the law 
of n. 


Corollary 2.5 Suppose that the distribution of n, has an unbounded support and has no mass at 
0. Then, DIS Qi + i Pj = 1, model (2.5) admits a unique strictly stationary solution. 


Proof. It is not difficult to show from (2.38) that the spectral radius o (A) of the matrix A = EA; 
is equal to 1 (Exercise 2.10). It can be shown that the assumptions on the distribution of 7, imply 
that inequality (2.39) is strict, that is, y < log p(A) (see Kesten and Spitzer, 1984, Theorem 2; 
Bougerol and Picard, 1992b, Corollary 2.2). This allows us to conclude by Theorem 2.8. 


Note that this strictly stationary solution has an infinite variance in view of Theorem 2.5. 


2.3 ARCH(oco) Representation* 


A process (€+) is called an ARCH(oo) process if there exists a sequence of iid variables (n) 
such that E(n,;) = 0 and E(?) = 1, and a sequence of constants ¢; > 0, i =1,..., and dg >0 
such that 


[oe] 
&=om, of =bo+)>) piei (2.40) 


i=1 


This class obviously contains the ARCH(q) process and we shall see that it more generally contains 
the GARCH(p, q) process. 


2.3.1 Existence Conditions 


The existence of a stationary ARCH(oo) process requires assumptions on the sequences (¢;) and 
(n;). The following result gives an existence condition. 


Theorem 2.6 (Existence of a stationary ARCH(oo) solution) For any s € (0, 1], let 


CO 
As =) \@) and ps = E|n|”*. 


i=l 
Then, if there exists s € (0, 1] such that 
As}2s < 1, (2.41) 
then there exists a strictly stationary and nonanticipative solution to model (2.40), given by 


[oe 
Et = Ott, of =p +p) 5D i el T 3 Me pews O A 6 (2.42) 


k=1 i1,..ik>1 
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The process (€;) defined by (2.42) is the unique strictly stationary and nonanticipative solution of 
model (2.40) such that E\e;\"° < œ. 


Proof. Consider the random variable 
CO 
Si=Go+ 0) Do bi big Mig 6 Maiq ig? (2.43) 
k=l iy,...ip>1 


taking values in [0, +00]. Since s € (0, 1], applying the inequality (a + b) < aë + b* fora,b>0 
gives 


io. @) 
5 2s 2 
Si s VEDD DD i EE E P e |r eee 


k=1 iy,...ig>1 


Using the independence of the 7;, it follows that 


CO 
ES <¢54+65>. Do Of ...@, EPS; Nn) 


k=1 i1,...ig>1 
CO og. 
= pi | 1+ X (Asu) | = ———. (2.44) 
9 ( 2 1— As H2s 


This shows that $, is almost surely finite. All the summands being positive, we have 


fore) CO 
>D Gi St—i ni = po D Pio Mig 
i=l 


iġ=1 


œo oo 
2 2 2 
+ do 5 Pio "ip >D bA Pi, tee Pit Nt—io—i1 °° Ut-ig—i, —-- ig 


ip=1 k=l iy,..ig>1 
CO 
_ 2 2 
= 0 ` ` Qio +- + Dig Mig © + + Mig —--—ig: 
k=0 ig,...ig>1 


Therefore, the following recursive equation holds: 


[oe] 
Si = po + >> Qi Siini 


i=1 


A strictly stationary and nonanticipative solution of (2.40) is then obtained by setting €; = s 2 ni. 
Moreover, Eje |” < H2504/C — Asus) in view of (2.44). Now denote by (e+) any strictly sta- 
tionary and nonanticipative solution of model (2.40), such that Ele;|?° < œo. For all q > 1, by q 


successive substitutions of the e; we get 


q 
2: 2 2 
0, = ptp 5 Qi Pig Mi, -e Ni ig 


k= iq,...i¢>1 
2 2 2 
F ` ' Qi +- Pigs Mi tee MN iy ig ti ig 
i1,..ig4121 


= Siq + Riq. 
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Note that S; q — S, a.s. when q — oo, where S, is defined in (2.43). Moreover, because the 
solution is nonanticipative, €; is independent of 7,’ for all t’ >t. Hence, 


5 s 5 2s 2s 2s 
ER SD) Oi Oh EUa” -iiig l leap tga 7) 
ij,..ig4121 
2s 
= (Astas) As Eles| a 


Thus 2 q>1 ER g < oo since A;sj2, < 1. Finally, R;g — 0 a.s. when q — œo, which implies 
5 > 
of = S; as. 


Equality (2.42) is called a Volterra expansion (see Priestley, 1988). It shows in particular that, 
under the conditions of the theorem, if ¢9 = 0, the unique strictly stationary and nonanticipative 
solution of the ARCH(oo) model is the identically null sequence. An application of Theorem 2.6, 
obtained for s = 1, is that the condition 


CO 
A=) 4 <1 
i=1 


ensures the existence of a second-order stationary ARCH(oo) process. If 


CO 
(EnD Doe <1, 


i=1 


it can be shown (see Giraitis, Kokoszka and Leipus, 2000) that FE ef <00, Cov(e?, e? n) > 0 for 
all h and 


+00 
XO Cov(e?, eè p) < 00. (2.45) 


h=—0oo 


The fact that the squares have positive autocovariances will be verified later in the GARCH case 
(see Section 2.5). In contrast to GARCH processes, for which they decrease at exponential rate, the 
autocovariances of the squares of ARCH(oo) can decrease at the rate h~” with y > 1 arbitrarily 
close to 1. A strictly stationary process of the form (2.40) such that 


is called integrated ARCH(oo), or IARCH(oco). Notice that an IARCH(oo) process has infi- 
nite variance. Indeed, if Ee? =o” < œ, then, by (2.40), ao? = ġo +07, which is impossible. 
From Theorem 2.8, the strictly stationary solutions of IGARCH models (see Corollary 2.5) admit 
IARCH(oo) representations. The next result provides a condition for the existence of IARCH(oo) 
processes. 


Theorem 2.7 (Existence of IARCH(co) processes) If A; = 1, if n? has a nondegenerate dis- 
tribution, if E|log n? | < œ and if, for some r>1, 7, gir! < œ, then there exists a strictly 
stationary and nonanticipative solution to model (2.40) given by (2.42). 


Other integrated processes are the long-memory ARCH, for which the rate of decrease of the ¢; 
is not geometric. 
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2.3.2 ARCH(oco) Representation of a GARCH 


It is sometimes useful to consider the ARCH(oo) representation of a GARCH(p, q) process. For 
instance, this representation allows the conditional variance of of €, to be written explicitly as a 
function of its infinite past. It also allows the positivity conditions (2.20) on the coefficients to be 
weakened. Let us first consider the GARCH(1, 1) model. If 6 < 1, we have 


oe) 
w (2-19 
a ee 2.46) 


In this case we have 
A= -Ds —__ A 
=De LF 


The condition A;/12, < 1 thus takes the form 
a’ us + BY <1, for some s € (0, 1]. 


For example, if œ + 6 < 1 this condition is satisfied for s = 1. However, second-order stationar- 
ity is not necessary for the validity of (2.46). Indeed, if (€,) denotes the strictly stationary and 
nonanticipative solution of the GARCH(1, 1) model, then, for any q > 1, 


=op taD pe, +o; (2.47) 


By Corollary 2.3 there exists s €(0,1[ such that E(o? S)=c<oo. It follows that 
X i E(PIo = ßc/(1— 6) < œ. So pioka converges a.s. to O and, by letting q 
go to infinity in (2.47), we get (2.46). More generally, we have the following property. 


Theorem 2.8 (ARCH(co) representation of a GARCH(p, q)) If (€+) is the strictly stationary 
and nonanticipative solution of model (2.5), it admits an ARCH(oo) representation of the form 
(2.40). The constants $; are given by 


CO 


_ a i A(z) 
b= Bay 262 =F O lsh (2.48) 
where A(z) = œiz +--+ + az? and B(z) = 1 — Biz —--- — Bpz”. 


Proof. Rewrite the model in vector form as 
2 2 
oa, = Ba; a +G 


where o? = (oF, bas OP gai) 8 = (w + Y Aie? s 0,...,0)’ and B is the matrix defined in 
Corollary 2.2. This corollary shows that, under the strict stationarity condition, we have p(B) < 1. 
Moreover, E ||c,||* < 00 by Corollary 2.3. Consequently, the components of the vector }°° o Bic 
are almost surely real-valued. We thus have 


t—i 


CO 


oF = eX B'o e = (1 0a 0 
i=0 


All that remains is to note that the coefficients obtained in this ARCH(oo) representation coincide 
with those of (2.48). 
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The ARCH(oo) representation can be used to weaken the conditions (2.20) imposed to ensure the 
positivity of oP. Consider the GARCH(p, q) model with w>0, without any a priori positivity 
constraint on the coefficients œ; and £j, and assuming that the roots of the polynomial B(z) = 
1 — B\z—---— pz” have moduli strictly greater than 1. The coefficients ¢; introduced in (2.48) 
are then well defined, and under the assumption 


#>0, i=l,..., (2.49) 


we have 


co 
of = po + >. Qie; € (0, +00]. 


i=1 


Indeed, o > 0 because otherwise B(1) < 0, which would imply the existence of a root inside the 
unit circle since 6(0) = 1. Moreover, the proofs of the sufficient parts of Theorem 2.4, Lemma 2.3 
and Corollary 2.3 do not use the positivity of the coefficients of the matrices A+. It follows that 
if the top Lyapunov exponent of the sequence (A+) is such that y < 0, the variable a? iS a.s. 
finite-valued. To summarize, the conditions 


w>0, y <0, ¢>0, i>1 and (B(z)=0 > |z|>1) 


imply that there exists a strictly stationary and nonanticipative solution to model (2.5). The condi- 
tions (2.49) are not generally simple to use, however, because they imply an infinity of constraints 
on the coefficients œ; and £;. In the ARCH(q) case, they reduce to the conditions (2.20), that 
is, œ; > 0, i= 1,...,q. Similarly, in the GARCH(1, 1) case, it is necessary to have a; > 0 and 
Bi = 0. However, for p > 1 and q > 1, the conditions (2.20) can be weakened (Exercise 2.14). 


2.3.3 Long-Memory ARCH 


The introduction of long memory into the volatility can be motivated by the observation that the 
empirical autocorrelations of the squares, or of the absolute values, of financial series decay very 
slowly in general (see for example Table 2.1). We shall see that it is possible to reproduce this 
property by introducing ARCH(oo) processes, with a sufficiently slow decay of the modulus of 
the coefficients ¢;.8 A process (X,) is said to have long memory if it is second-order stationary 
and satisfies, for h > oo, 


1 
|Cov(X;, X;__)| ~ Kn?4-!, where d < 7 (2.50) 


and K is a nonzero constant. An alternative definition relies on distinguishing ‘intermediate- 
memory’ processes for which d < 0 and thus Ea |Cov(X;, X;—-n)| < co, and ‘long-memory’ 


processes for which d € (0, 1/2[ and thus pe |Cov(X;, X:-n)| = co (see Brockwell and 
Davis, 1991, p. 520). The autocorrelations of an ARMA process decrease at exponential rate 
when the lag increases. The need for processes with a slower autocovariance decay leads to the 
introduction of the fractionary ARIMA models. These models are defined through the fractionary 


difference operator 


(aaa). SS d>—l. 
j=l i! 


8 The slow decrease of the empirical autocorrelations can also be explained by nonstationarity phenomena, 
such as structural changes or breaks (see, for instance, Mikosch and Starica, 2004). 
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Denoting by 7; the coefficient of B/ in this sum, it can be shown that 7r y~ Ki —4-1 when j > œ, 
where K is a constant depending on d. An ARIMA(p, d, q) process with d € (—0.5, 0.5[ is defined 
as a stationary solution of 

w(B)(1 — B)’ X, = 0(B)e, 


where €; is a white noise, and w and 0 are polynomials of degree p and q, respectively. If the roots 
of these polynomials are all outside the unit disk, the unique stationary and purely deterministic 
solution is causal, invertible and its covariances satisfy (2.50) (see Brockwell and Davis, 1991, 
Theorem 13.2.2.). By analogy with the ARIMA models, the class of FIGARCH(p, d, q) processes 
is defined by the equations 


2 a 9(B) | 2 
&=om, of =po+}1-—(1— B} le, de@ ll, do>0, (2.51) 
Y (B) 
where y and @ are polynomials of degree p and q respectively, such that 4 (0) = 0 (0) = 1, the 
roots of y have moduli strictly greater than 1 and ¢; > 0, where the ¢; are defined by 


er A(z) 
i d 

262 isted 
We have ¢; ~ Ki~¢—!, where K isa positive constant, when i — oo and Dar ¢; = 1. The process 
introduced in (2.51) is thus an IARCH(0o), provided it exists. Note that existence of this process 
cannot be obtained by Theorem 2.7 because, for the FIGARCH model, the ¢; decrease more slowly 
than the geometric rate. The following result, which is a consequence of Theorem 2.6, and whose 
proof is the subject of Exercise 2.20, provides another sufficient condition for the existence of 
IARCH(oo) processes. 


Corollary 2.6 (Existence of some FIGARCH processes) IfA; = 1, then condition (2.41) is sat- 
isfied if and only if there exists p* € (0, 1] such that Ap» < œo and 


> 4i logi + EC log ng) € (0, +00]. (2.52) 


i=1 


The strictly stationary and nonanticipative solution of model (2.40) is thus given by (2.42) and is 
such that E\é;|4 < ov, for any q € [0, 2[, and Ee? = 00. 


This result can be used to prove the existence of FIGARCH(p, d, q) processes for d € (0, 1[ suffi- 
ciently close to 1, if the distribution of nb is assumed to be nondegenerate (hence, E (na log nå) >0); 
see Douc, Roueff and Soulier (2008). The FIGARCH process of Corollary 2.6 does not admit a 
finite second-order moment. Its square is thus not a long-memory process in the sense of definition 
(2.50). More generally, it can be shown that the squares of the ARCH(oo) processes do not have 
the long-memory property. This motivated the introduction of an alternative class, called linear 
ARCH (LARCH) and defined by 


oo 
6 =m, o, = bo + X bieri, n, iid (0, 1). (2.53) 
i=0 


Under appropriate conditions, this model is compatible with the long-memory property for é 
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2.4 Properties of the Marginal Distribution 


We have seen that, under quite simple conditions, a GARCH(p, q) model admits a strictly stationary 
solution (€,). However, the marginal distribution of the process (€;) is never known explicitly. The 
aim of this section is to highlight some properties of this distribution through the marginal moments. 


2.4.1 Even-Order Moments 


We are interested in finding existence conditions for the moments of order 2m, where m is any 
positive integer.? Let ®@ denote the tensor product, or Kronecker product, and recall that it is 
defined as follows: for any matrices A = (a;;) and B, we have A @ B = (q;;B). For any matrix 
A, let A8” = A@---@A. We have the following result. 


Theorem 2.9 (2mth-order stationarity) Let A™ = E (A®”") where A, is defined by (2.17). Sup- 
pose that E ne < œ and that the spectral radius 


p(A™) <1. 


Then, for any t € Z, the series (z,) defined in (2.18) converges in L™ and the process (€?), 
defined as the first component of z, is strictly stationary and admits moments up to order m. 
Conversely, if p(A%) > 1, there exists no strictly stationary solution (€;) to (2.5) such that 
E(€2!") < œœ. 


Example 2.3 (Moments of a GARCH(1, 1) process) When p=q = 1, the matrix A, is 
written as 


Ar = (n?, 1) (œ, b1). 


Hence, all the eigenvalues of the matrix A™ = E{(n?, 1) 2} (a, B))®” are null except one. The 
nonzero eigenvalue is thus the trace of A”). It readily follows that the necessary and sufficient 
condition for the existence of E(€2”) is 


~ m i gm-i 
2 : ) aie üzel (2.54) 
i=0 


where uz = E (n ), i =0,...,m. The moments can be computed recursively, by expanding 
E (28m) = E(b,+ A; a ae For the fourth-order moment, a direct computation gives 


E(f) = Eo) EM) 
= u4 [a + 2o(ay + Bi) E(E7_) + (BT + 201 Bi) E041) + af E(et_,)} 


and thus P 
œ (1 +a + Bi) 


EE nr Te 
(1 — ma? — pi — 2a) B))(1 — a — Bi) i 


provided that the denominator is positive. Figure 2.10 shows the zones of second-order and fourth- 
order stationarity for the strong GARCH(1, 1) model when n; ~ M(0, 1). 


Ef) = 


°? Only even-order moments are considered, because if a symmetry assumption is made on the distribution 
of n, the odd-order moments are null when they exist. If this symmetry assumption is not made, computing 
these moments seems extremely difficult. 
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Figure 2.10 Regions of moments existence for the GARCH(1, 1) model: 1, moment of order 4; 
1 and 2, moment of order 2; 3, infinite variance. 


This example shows that for a nontrivial GARCH process, that is, when the a; and £; are not all 
equal to zero, the moments cannot exist for any order. 


Proof of Theorem 2.9. For k > 0, let 
Aik = ApAr1-+:Areyi1, and Zik 7 Ar kbi- 


with the convention that A; o = Ip4q and z, , = b,. Notice that the components of z, := Drog i 
are almost surely defined in [0, +00) U {oo}, without any restriction on the model coefficients. Let 
|| - || denote the matrix norm such that || Al] = yi j laij]. Using the elementary equalities 
ANB] = IA 8 Bll = |B @ All 

and the associativity of the Kronecker product, we obtain, for k > 0, 

Ellz, pl” = Ell Arbi @ +++ @ Ar nd, ll = IE (Are), @ +++ @ Atel. 
since the elements of the matrix A; ,b,_, are positive. For any vector X conformable to the matrix 
A, we have 


(AX)®" = A9" xe" 


by the property of the Kronecker product, AB ® CD = (A & C)(B 8 D), for any matrices such 
that the products AB and CD are well defined. It follows that 


Ellz, ll” = WEA, eb P = ECAR”. AP Be D- (2.55) 


Let b™ = E(b®”) and recall that A™ = E(A®”). In view of (2.55), we get 


Eliz, pl” = |(A™)*b™ |], 
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using the independence between the matrices in the product A;...A;~x%41b,_; (since A;—; is a 
function of 7,-;). The matrix norm being multiplicative, it follows, using (2.18), that 


1 
elle = {Elz "y 


CO 
< > liz, glim 
k=0 


co 
< | L pam} weer. (2.56) 


k=0 


If the spectral radius of the matrix A” is strictly less than 1, then ||(A“)*|| converges to zero 
at exponential rate when & tends to infinity. In this case, z, is almost surely finite. It is the strictly 
stationary solution of equation (2.16), and this solution belongs to L™. It is clear, on the other 
hand, that 


T 
ef lm < Illz, llm 


because the norm of z, is greater than that of any of its components. A sufficient condition for the 
existence of E my is then p(A) < 1. 

Conversely, suppose that (e?) belongs to L”. For any vectors x and y of the same dimension, 
let x < y mean that the components of y — x are all positive. Then, for any n > 0, 


Št Zote FZ, yt Ar.. Akg 
n em 
>E ( ua} 
k=0 
n 
Doo 
k=0 
n 
= (A0) p™ 
k=0 


because all the terms involved in theses expressions are positive. Since the components of E Ge") 
are finite, we have 


lim (A™)"b™ = 0. (2.57) 


noo 


To conclude, it suffices to show that 


lim (A””)" =0 (2.58) 


n—->>Co 


because this is equivalent to p(A“”) < 1. To deduce (2.58) from (2.57), we need to show that for 
any fixed integer k, 


the components of (A“”)‘b“” are all strictly positive. (2.59) 
The previous computations showed that 
(AMD = EGP). 


Given the form of the matrices A,, the qth component of z i is the first component of b,_ g which 


is not almost surely equal to zero. First suppose that a, and fp are both not equal to zero. In this 
case the first component of z, , cannot be equal to zero almost surely, for any k > q + 1. Also, 
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still in view of the form of the matrices A;, the ith component of z, , is the (i — 1)th component 
of z,_) ,-, fori = 2, ..., q. Hence, none of the first q components of Z, 2q Can be equal to zero 
almost surely, and the same property holds for z, , whatever k > 2q. The same argument shows 
that none of the last q components of z, , is equal to zero almost surely when k > 2p. Taking 
into account the positivity of the variables z, ,, this shows that in the case a8, 4 0, (2.59) holds 
true for k > max{2p, 2q}. If a7Bp = 0, one can replace z, by a vector of smaller size, obtained 
by canceling the component e? gti if æq = 0 and the component oF p+l if B, = 0. The matrix 
A; is then replaced by a matrix of smaller size, but with the same nonzero eigenvalues as A,. 
Similarly, the matrix A” will be replaced by a matrix of smaller size but with the same nonzero 
eigenvalues. If ~,_1B,-1 4 0 we are led to the preceding case, otherwise we pursue the dimension 
reduction. 


2.4.2 Kurtosis 


An easy way to measure the size of distribution tails is to use the kurtosis coefficient. This 
coefficient is defined, for a centered (zero-mean) distribution, as the ratio of the fourth-order 
moment, which is assumed to exist, to the squared second-order moment. This coefficient is equal 
to 3 for a normal distribution, this value serving as a gauge for the other distributions. In the case 
of GARCH processes, it is interesting to note the difference between the tails of the marginal and 
conditional distributions. For a strictly stationary solution (€;) of the GARCH(p, q) model defined 


by (2.5), the conditional moments of order k are proportional to o2k: 


E(e* | €u, u<th= o En”). 


The kurtosis coefficient of this conditional distribution is thus constant and equal to the kurtosis 
coefficient of n,. For a general process of the form 


Et = Otr, 


where o; is a measurable function of the past of €,, 1; is independent of this past and (n+) is iid 
centered, the kurtosis coefficient of the stationary marginal distribution is equal, provided that it 


exists, to 
EA) — E{E@A | eu u <0} EA) 


“EOP [EEan] ear” 


where x, = E nt denotes the kurtosis coefficient of (n+). It can thus be seen that the tails of 
the marginal distribution of (e+) are fatter when the variance of o? is large relative to the squared 
expectation. The minimum (corresponding to the absence of ARCH effects) is given by the kurtosis 
coefficient of (n+), 

Ke = Ky, 


with equality if and only if o is almost surely constant. In the GARCH(1, 1) case we thus have, 
from the previous calculations, 


1 — (a, + b)? E 
1— (a1 + Bi)? -a?e 1) T 


The excess kurtosis coefficients of €, and 7;, relative to the normal distribution, are related by 


(2.60) 


Ke = 


6a? + K*{1 — (a1 + Bi)? + 307} 
=k. -3= K,, = Ky = 3: 


a Aa oe i a Ae 
. 1 — (a + Bi)? — 2a? — kta? 1 


The excess kurtosis of e; increases with that of 7, and when the GARCH coefficients 
approach the zone of nonexistence of the fourth-order moment. Notice the asymmetry between the 
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GARCH coefficients in the excess kurtosis formula. For the general GARCH(p, q), we have the 
following result. 


Proposition 2.1 (Excess kurtosis of a GARCH(p, q) process) Let (€;) denote a GARCH(p, q) 
process admitting moments up to order 4 and let kz be its excess kurtosis coefficient. Let a = 
ar y? where the coefficients w; are defined by 


oo max(p,q) i ! P 
1+ wz =41- SO (@ +p (1-22); 
i=1 i=1 i=l 


Then the excess kurtosis of the distribution of €, relative to the Gaussian is 


F 6a +e + 3a) 


k= : ; 
1 — alk; +2) 


Proof. The ARMA(max(p, q), p) representation (2.4) implies 


CO 
e? = Ee? +v t+ X Yiv, 
i=l 
where )~7°, |Wi| < œœ follows from the condition yo a + B;) < 1, which is a consequence 
of the existence of Ee}. The process (v;) = (a7 n? —1)) is a weak white noise of variance 
Var(v,) = Eo} E (n? — 1)? = (ky — 1)Eo;. It follows that 


Var(e?) = Var(v;) + 5 Y? Var(v i) = (a + 1) (ky — 1) Eos 
i=l 


= Ee} — (Ee?)? = «Eos — (Ee?)’, 


hence 
(Eey 


Ect = L 
i Ky — (a + 1)(k, — 1) 


and F 7 
Y 
ke = ————_L—_ = l 


Ky — (a+ I)(k, — 1) 1—a(ky 1)’ 


and the proposition follows. 


It will be seen in Chapter 7 that the Gaussian quasi-maximum likelihood estimator of the coefficients 
of a GARCH model is consistent and asymptotically normal even if the distribution of the variables 
n: is not Gaussian. Since the autocorrelation function of the squares of the GARCH process does 
not depend on the law of n+, the autocorrelations obtained by replacing the unknown coefficients by 
their estimates are generally very close to the empirical autocorrelations. In contrast, the kurtosis 
coefficients obtained from the theoretical formula, by replacing the coefficients by their estimates 
and the kurtosis of 7, by 3, can be very far from the coefficients obtained empirically. This is not 
very surprising since the preceding result shows that the difference between the kurtosis coefficients 


of €, computed with a Gaussian and a non-Gaussian distribution for nr, 
K*(1 —a) 

* *) LL * 0 = n , 
Ke(k,) — Ke (0) KEETE aF D) 


is not bounded as a approaches (Caa +2)71, 
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2.5 Autocovariances of the Squares of a GARCH 


We have seen that if (€,) is a GARCH process which is fourth-order stationary, then (e?) is an 
ARMA process. It must be noted that this ARMA is very constrained, as can be seen from (2.4): 
the order of the AR part is larger than that of the MA part, and the AR coefficients are greater 
than those of the MA part, which are positive. We shall start by examining some consequences 
of these constraints on the autocovariances of (€?). Then we shall show how to compute these 
autocovariances explicitly. 


2.5.1 Positivity of the Autocovariances 


For a GARCH(1, 1) model such that E e < œ, the autocorrelations of the squares take the form 
pe (h) = Corr(€?, en) = paar +p)", h>1, (2.61) 


where 


œi{l — Bi(a, + Bi)} 


ji 
Pe2( ) 1 =o; + Bi)? +a? 


(2.62) 


(Exercise 2.8). It follows immediately that these autocorrelations are nonnegative. The next property 
generalizes this result. 


Proposition 2.2 (Positivity of the autocovariances) Jf the GARCH(p,q) process (€,) admits 
moments of order 4, then 
ye(h) = Cov(e?, e? ,) > 0, Yh. 


If, moreover, a, > Q, then 
ye (h) > 0, Vh. 


Proof. It suffices to show that in the MA(oo) expansion of e?, all the coefficients are nonnegative 
(and strictly positive when a; > 0). In the notation introduced in(2.2), this expansion takes the form 


e€ ={1-(@+ B)()}'@ + (1 — @ + PBA — B(B))u, = @* + Y (B)u. 


Noting that 1 — 6(B) = 1 — (a+ B)(B) + a (B), it suffices to show the nonnegativity of the coef- 
ficients c; in the series expansion 


a(B) coe 
— = iB'. 
1—(@+A)B) 2 


We show by induction that 
ci >a + fi)", izl 
Obviously cı = a@;. Moreover, with the convention a; = 0 if i >q and £; = 0 if j > p, 


Cin. = Ci (œ; + Bi) +... Herl + Bi) + ita 


If the inductive assumption holds true at order i, using the positivity of the GARCH coefficients 
we deduce that cj; > aj (a + Bi)!. Hence the desired result. 


Remark 2.7 The property that the autocovariances are nonnegative is satisfied, more generally, 
for an ARCH(oo) process of the form (2.40), provided it is fourth-order stationary. It can also be 
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shown that the square of an ARCH(oo) process is associated. On these properties see Giraitis, 
Leipus and Surgailis (2009) and references therein. 


Note that the property of positive autocorrelations for the squares, or for the absolute values, is 
typically observed on real financial series (see, for instance, the second row of Table 2.1). 


2.5.2 The Autocovariances Do Not Always Decrease 


Formulas (2.61) and (2.62) show that for a GARCH(1, 1) process, the autocorrelations of the 
squares decrease. An illustration is provided in Figure 2.11. A natural question is whether this 
property remains true for more general GARCH(p, q) processes. The following computation shows 
that this is not the case. Consider an ARCH(2) process admitting moments of order 4 (the existence 
condition and the computation of the fourth-order moment are the subject of Exercise 2.7): 


9 
e= Jo +a? FAE Mr 


We know that é is an AR(2) process, whose autocorrelation function satisfies 
pelh) = œipe(h — 1) + œrpe(h — 2), h>0. 


It readily follows that 
Pel) a? + a(1 — a2) 
pe(1) ay 


and hence that 
Pel?) < p20) = œl- a) <a; —a)). 


The latter equality is of course true for the ARCH(1) process (œ2 = 0) but is not true for any 
(a1, a2). Figure 2.12 gives an illustration of this nondecreasing feature of the first autocorrelations 
(and partial autocorrelations). The sequence of autocorrelations is, however, decreasing after a 
certain lag (Exercise 2.16). 


0.4F 0.45 

0.35 0.3F 

0.2 F 0.25 

0.1F 0.16 
s h F h 
2 4 #6 8 10 12 2 4 6 8 10 12 


Figure 2.11 Autocorrelation function (left) and partial autocorrelation function (right) of the 
squares of the GARCH(1, 1) model €; = orn, 67 = 1+0.3€?_, + 0.5507 ,, (1+) iid MO, 1). 
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Figure 2.12 Autocorrelation function (left) and partial autocorrelation function (right) of the 
squares of the ARCH(2) process €; = orn, 07 = 1 + 0.le?_, +0.25€?_,, (n+) iid MO, 1). 


2.5.3 Explicit Computation of the Autocovariances of the Squares 


The autocorrelation function of (e?) will play an important role in identifying the orders of the 
model. This function is easily obtained from the ARMA(max(p, q), p) representation 


max(p,q) 


p 

2 2 

€& 5 (œi + Bie; = 0 + v X Bivi. 
i=l i=l 


The autocovariance function is more difficult to obtain (Exercise 2.8) because one has to compute 
Ev? = E(n? — 1)°Eo;. 


One can use the method of Section 2.4.1. Consider the vector representation 


defined by (2.16) and (2.17). Using the independence between 
elementary properties of the Kronecker product ®, we get 


Z and (b,, Ar), together with 


Ez?’ = Eb, + Aiz, 1) Q @ + Arz, 1) 
= Eb, Q b, + EA:z,_1 8 b, + Eb, ® Arz, + EAZ, 1 8 AZ 
= Eb?’ + EA, ® b,Ez,_, + Eb, ® A, Ex,_, + EAP? Eze. 


Thus 


Ez® = (lp4g2 — A®) | {B® + (EA; @ b, + Eb, 8 Ar) 2}, (2.63) 


=f 


where 
A™ = E(A®”), zm) = Erem. b™ = E(b8”). 


To compute A“, we can use the decomposition A; = n? B + C, where B and C are deterministic 


matrices. We then have, letting Um = Enr, 


A® = E(n?B + C) Q (7B + C) = pB? + BQ@C+C@B+C™. 
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We obtain EA; ®b, and Eb, ® A, similarly. All the components of z® are equal to w/(1 — 
Xai — > pi). Note that for h > 0, we have 


Ez, Q z,_, = E (b, + Arz, 1) B Zin 
=p 8z” + (A @ Ip+q) EZ, 8 Zp: (2.64) 
Let ee, = ,0,...,0) € RPH)? , The following algorithm can be used: 


e Define the vectors z®, b®, b®, and the matrices EA, ®@b,, Eb, Q A;, A®, A® as a 
function of a;, i, and œ, H4. 


e Compute Ez®* from (2.63). 


e Forh=1,2,..., compute Ez, Q Zin from (2.64). 
e For h =0,1,..., compute ye (h) = ¢,Ez, 8 Z, n — (e,2?). 


This algorithm is not very efficient in terms of computation time and memory space, but it is easy 
to implement. 


2.6 Theoretical Predictions 


The definition of GARCH processes in terms of conditional expectations allows us to compute the 
optimal predictions of the process and its square given its infinite past. Let (€,) be a stationary 
GARCH(p, q) process, in the sense of Definition 2.1. The optimal prediction (in the L? sense) of 
€; given its infinite past is 0 by Definition 2.1(i). More generally, for h > 0, 


E(€r+n | €u, U < t) = E{E(ertn | €t+n-1) leu, u < t} =0, tez, 


which shows that the optimal prediction of any future variable given the infinite past is zero. The 
main attraction of GARCH models obviously lies not in the prediction of the GARCH process 
itself but in the prediction of its square. The optimal prediction of e? given the infinite past of e€; 
is op. More generally, the predictions at horizon h > 0 are obtained recursively by 


2 2 
E(€iin | Eu, U < t) = Elofy | €u, U < t) 


q P 
2 2 
=w+ ` AiE (E441) | Eu, U < t) + J BEO nj | €us U < t), 
i=l j=l 


with, for i < h, ; 7 
E(€fyh-i | €u, U < t) = E (Ofni | €u, U < t), 


for i >h, 
E(é, 


2, 
th—i | €u, U < t) = Et+h-i> 


and for i > h, 
E(o2 


tth-i 


2 
| €u, U < t) =Of,)_;- 


These predictions coincide with the optimal linear predictions of the future values of e? given 
its infinite past. We shall consider in Chapter 4 a more general class of GARCH models (weak 
GARCH) for which the two types of predictions, optimal and linear optimal, do not necessarily 
coincide. 

It is important to note that E(é,, | €u, u < t) = Var(E;+p | €u, U < t) is the conditional vari- 
ance of the prediction error of €;+n. Hence, the accuracy of the predictions depends on the past: 
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it is particularly low after a turbulent period, that is, when the past values are large in absolute 
value (assuming that the coefficients œ; and £; are nonnegative). This property constitutes a crucial 
difference with standard ARMA models, for which the magnitude of the prediction intervals is 
constant, for a given horizon. 

Figures 2.13—2.16, based on simulations, allow us to visualize this difference. In Figure 2.13, 
obtained from a Gaussian white noise, the predictions at horizon 1 have a constant variance: 
the confidence interval [—1.96, 1.96] contains roughly 95% of the realizations. Using a constant 
interval for the next three series, displayed in Figures 2.14—2.16, would imply very bad results. In 
contrast, the intervals constructed here (for conditionally Gaussian distributions, with zero mean 
and variance ar do contain about 95% of the observations: in the quiet periods a small interval 
is enough, whereas in turbulent periods the variability increases and larger intervals are needed. 

For a strong GARCH process it is possible to go further, by computing optimal predictions of 
the powers of é?, provided that the corresponding moments exist for the process (n+). For instance, 
computing the predictions of ef allows us to evaluate the variance of the prediction errors of é 
However, these computations are tedious, the linearity property being lost for such powers. 

When the GARCH process is not directly observed but is the innovation of an ARMA process, 
the accuracy of the prediction at some date ¢ directly depends of the magnitude of the conditional 
heteroscedasticity at this date. Consider, for instance, a stationary AR(1) process, whose innovation 
is a GARCH(1, 1) process: 


Xt =X- +é 
E = Ott (2.65) 
of = w+ ae? | + Bo?_,, 


where w>0,a > 0, 8 > 0,a+ 6 <1 and || < 1. We have, for h > 0, 
Xtth = €tr+h + PGt4n-1 te + g'e F o"t! X. 


Hence, 
E(Xi+n|Xu, u <t) = o! Xi 


| 


I | | I 
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Figure 2.13 Prediction intervals at horizon 1, at 95%, for the strong M(0, 1) white noise. 


GARCH(p, q) PROCESSES 55 


13 5 - 


T T T T 
100 200 300 400 500 


Figure 2.14 Prediction intervals at horizon 1, at 95%, for the GARCH(1, 1) process simulated 
with œ = 1, a = 0.1, 8 = 0.8 and M(0, 1) distribution for (n+). 
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Figure 2.15 Prediction intervals at horizon 1, at 95%, for the GARCH(1, 1) process simulated 
with œ = 1, æ = 0.6, 8 = 0.2 and M(0, 1) distribution for (n+). 
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Figure 2.16 Prediction intervals at horizon 1, at 95%, for the GARCH(1, 1) process simulated 
with œ = 1, a = 0.7, B = 0.3 and M(0, 1) distribution for (n+). 


since the past of X, coincides with that of its innovation €. Moreover, 


h 
Var(X;4n|Xu, u < t) = Var (> T erpi | €u, U < r) 


i=0 
h 
= J 9° Valeri | eu, u < t). 
i=0 
Since Var(e; | €u, u < t) = of and, for i > 1, 
Var(€r+i | €u, U < t) = Elof | €u, Uu <t) 
=w + (@ + PEA i1 | €u u <t) 
= ofl +: + (a+ pT) + (@ + por, 


we have 


= L 
Var(erpi | €u, u < t) = op Ee +(a+)io2, for alli >0. 


Consequently, 


Var(X +4 | Xis u < t) 


h 
2X(h—i) ipt- J g2 = 
-(%% E rea tL th [e D] 


a= tD) 5 w gent) —(a+ Byer) 
“Haws pao fo; g ers} Paap) 
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if 6? Za + p and 


1 — grt , 
Var(X;44 | Xu, u < t) = we + fo? = ee (h +1)” 


if ¢? =a+ p. The coefficient of o? — eTA always being positive, it can be seen that the 
variance of the prediction at horizon h increases linearly with the difference between the conditional 
variance at time ¢ and the unconditional variance of €,. A large negative difference (corresponding 
to a low-volatility period) thus results in highly accurate predictions. Conversely, the accuracy 
deteriorates when oa? is large. When the horizon h increases, the importance of this factor decreases. 
If h tends to infinity, we retrieve the unconditional variance of X;: 


; Var(€;) 
lim Var(Xi4n | Xu, u < t) = Var(X;) = . 
hoo 1— o? 


Now we consider two nonstationary situations. If |ġ| = 1, and initializing, for instance at 0, 
all the variables at negative dates (because here the infinite pasts of X, and e, do not coincide), 
the previous formula becomes 


/ qi= (h+1) 
Var(Xi4n | Xu, u < t) = On + | 2 w | lwp 


a—-@+p) |* Ieo) 1-@+h 


Thus, the impact of the observations before time £ does not vanish as h increases. It becomes 
negligible, however, compared to the deterministic part which is proportional to h. If |ġ| < 1 and 
a+ 6 = 1 (IGARCH(1, 1) errors), we have 


Var(€r+i | €u, U < t) = œi + ar for alli > 0, 


and it can be seen that the impact of the past variables on the variance of the predictions remains 
constant as the horizon increases. This phenomenon is called persistence of shocks on the volatility. 
Note, however, that, as in the preceding case, the nonrandom part of the decomposition of Var (e+; | 
Eu, U < t) becomes dominant when the horizon tends to infinity. The asymptotic precision of the 
predictions of e; is null, and this is also the case for X, since 


Var(X;+n | Xu, u < t) = Var (Ern | Eu, U < t): 
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Picard (1992a) established the converse property showing that, under an irreducibility condition, 
a necessary condition for the existence of a strictly stationary and nonanticipative solution is that 
y <0. Liu (2007) used Representation (2.30) to obtain stationarity conditions for more general 
GARCH models. The second-order stationarity condition for the GARCH(p, q) model was obtained 
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polynomial and vice versa) studied by Breidt, Davis and Trindade (2001). 

ARCH(oo) models were introduced by Robinson (1991); see Giraitis, Leipus and Surgailis 
(2009) for the study of these models. The condition for the existence of a strictly stationary 
ARCH(oo) process was established by Robinson and Zaffaroni (2006) and Douc, Roueff and 
Soulier (2008). The condition for the existence of a second-order stationary solution, as well as 
the positivity of the autocovariances of the squares, were obtained by Giraitis, Kokoszka and 
Leipus (2000). Theorems 2.8 and 2.7 were proven by Kazakevičius and Leipus (2002, 2003). 
The uniqueness of an ARCH(oo) solution is discussed in Kazakevičius and Leipus (2007). The 
asymptotic properties of quasi-maximum likelihood estimators were established by Robinson and 
Zaffaroni (2006). See Doukhan, Teyssiére and Vinant (2006) for the study of multivariate extensions 
of ARCH(co) models. The introduction of FIGARCH models is due to Baillie, Bollerslev and 
Mikkelsen (1996), but the existence of solutions was recently established by Douc, Roueff and 
Soulier (2008), where Corollary 2.6 is proven. LARCH(oo) models were introduced by Robinson 
(1991) and their probability properties studied by Giraitis, Robinson and Surgailis (2000), Giraitis 
and Surgailis (2002), Berkes and Horvath (2003a) and Giraitis et al. (2004). The estimation of such 
models has been studied by Beran and Schiitzner (2009), Truquet (2008) and Francq and Zakoian 
(2009c). 

The fourth-order moment structure and the autocovariances of the squares of GARCH processes 
were analyzed by Milhgj (1984), Karanasos (1999) and He and Teräsvirta (1999). The necessary 
and sufficient condition for the existence of even-order moments was established by Ling and 
McAleer (2002a), the sufficient part having been obtained by Chen and An (1998). Ling and 
McAleer (2002b) derived an existence condition for the moment of order s, with s > 0, for a family 
of GARCH processes including the standard model and the extensions presented in Chapter 10. 
The computation of the kurtosis coefficient for a general GARCH(p, q) model is due to Bai, Russel 
and Tiao (2004). 

Several authors have studied the tail properties of the stationary distribution. See Mikosch and 
Starica (2000), Borkovec and Kliippelberg (2001), Basrak, Davis and Mikosch (2002) and Davis 
and Mikosch (2009). 

Andersen and Bollerslev (1998) discussed the predictive qualities of GARCH, making a clear 
distinction between the prediction of volatility and that of the squared returns (Exercise 2.21). 


2.8 Exercises 


2.1 (Noncorrelation of €, with any function of its past?) 
For a GARCH process does Cov(é;, f (€+-n)) = 0 hold for any function f and any h > 0? 


2.2 (Strict stationarity of GARCH(1, 1) for two laws of n) 
In the GARCH(1, 1) case give an explicit strict stationarity condition when (i) the only 
possible values of n, are —1 and 1; (ii) 7, follows a uniform distribution. 


2.3 (Lyapunov coefficient of a constant sequence of matrices) 
Prove equality (2.21) for a diagonalizable matrix. Use the Jordan representation to extend 
this result to any square matrix. 


2.4 (Lyapunov coefficient of a sequence of matrices) 
Consider the sequence (A;) defined by A; = z,A, where (z+) is an ergodic sequence of real 
random variables such that E logt |z;| < oo, and A is a square nonrandom matrix. Find the 
Lyapunov coefficient y of the sequence (A;) and give an explicit expression for the condition 
y <0. 


2.5 (Multiplicative norms) 
Show the results of footnote 6 on page 30. 


2.6 


2.7 


2.8 


2.9 


2.10 


2.11 


2.12 


2.13 
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(Another vector representation of the GARCH(p, q) model) 
Verify that the vector A = (oF, ats o2 p es er E gti)’ e R?+4—! allows us to define, 
for p > 1 and q > 2, a vector representation which is equivalent to those used in this chapter, 


of the form z* = bř + Afzt_). 


(Fourth-order moment of an ARCH(2) process) 
Show that for an ARCH(2) model, the condition for the existence of the moment of order 4, 
with u4 = E ni, is written as 


1+ 


2 ba 2 
œa <l and pyay < (1 — u45). 
a 


Compute this moment. 


(Direct computation of the autocorrelations and autocovariances of the square of a 
GARCH(1, 1) process) 
Find the autocorrelation and autocovariance functions of (e?) when (€+) is solution of the 
GARCH(1, 1) model 
Et = Otr 
| o? =w + ae + Boi» 


where (7;) ~ MOO, 1) and 1 — 3g? — 8? — 2aß > 0. 


(Computation of the autocovariance of the square of a GARCH(1, 1) process by the general 
method) 

Use the method of Section 2.5.3 to find the autocovariance function of (e3) when (€;) is 
solution of a GARCH(1, 1) model. Compare with the method used in Exercise 2.8. 


(Characteristic polynomial of EA;) 
Let A = EA,, where {A;, t € Z} is the sequence defined in (2.17). 


1. Prove equality (2.38). 
2. If} a; + DS 8; = 1, show that p(A) = 1. 


(A condition for a sequence X, to be o(n).) 
Let (X,,) be a sequence of identically distributed random variables, admitting an expectation. 


Show that 


Xn 
— —> 0 whenn > œ 
n 


with probability 1. Prove that the convergence may fail if the expectation of X,, does not 
exist (an iid sequence with density f(x) = x? 1,>; may be considered). 


(A case of dependent variables where the expectation of a product equals the product of the 
expectations) 
Prove equality (2.31). 


(Necessary condition for the existence of the moment of order 2s) 
Suppose that (e+) is the strictly stationary solution of model (2.5) with Æ és < oo, for s € 
(0, 1]. Let 


K 
ge? = b, + x A;A;—1 eee At—k+1b; k- (2.66) 
k=l 
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2.15 


2.16 


2.17 


2.18 


2.19 
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1. Show that when K — oo, 


eae SO aay. Be oe leo. 


2. Use this result to prove that E(||A,;Ax_1 ... Ay do|l°) > 0 as k > ow. 
3. Let (X,) a sequence of £ x m matrices and Y = (Yj,..., Ym)/ a vector which is indepen- 


dent of (X,,) and such that for all i, 0 < E|Y;|° < oo. Show that, when n —> on, 
EX, Y| ~0 > ElX,l > 0. 


4. Let A = EA;, b = Eb, and suppose there exists an integer N such that ANb >O (in the 
sense that all elements of this vector are strictly positive). Show that there exists kp > 1 
such that 

E(||AkpAky—1 Al) < 1. 


5. Deduce (2.35) from the preceding question. 


6. Is the condition a; + 6; > 0 necessary? 


(Positivity conditions) 
In the GARCH(1, q) case give a more explicit form for the conditions in (2.49). Show, by 
taking q = 2, that these conditions are less restrictive than (2.20). 


(A minoration for the first autocorrelations of the square of an ARCH) 
Let (€,) be an ARCH(q) process admitting moments of order 4. Show that, fori = 1,..., q, 


pe (i) = ai. 


(Asymptotic decrease of the autocorrelations of the square of an ARCH(2) process) 

Figure 2.12 shows that the first autocorrelations of the square of an ARCH(2) process, 
admitting moments of order 4, can be nondecreasing. Show that this sequence decreases after 
a certain lag. 


(Convergence in probability to —oo) 
If (Xn) and (Y„) are two independent sequences of random variables such that X, + Y, > 
—oo and X, Æ —oo in probability, then Y,,  —oo in probability. 


(GARCH model with a random coefficient) 
Proceeding as in the proof of Theorem 2.1, study the stationarity of the GARCH(1, 1) model 
with random coefficient w = w(n;—-1), 


€ = Ott 
2.67 
| o? =a(m-1) + we? + Bo? i, een 


under the usual assumptions and w(n;—1) > 0 a.s. Use the result of Exercise 2.17 to deal with 
the case y := E loga(n,) = 0. 


(RiskMetrics model) 
The RiskMetrics model used to compute the value at risk (see Chapter 12) relies on the 
following equations: 
Er = Otr, m) iid MO, 1) 
| of = ho + — Ader, 


where 0 < à < 1. Show that this model has no stationary and non trivial solution. 
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2.20 (I[ARCH(0oo) models: proof of Corollary 2.6) 


1. In model (2.40), under the assumption A; = 1, show that 


lo) 
X Glog g; <0 and E(nj log na) = 0. 


i=l 


2. Suppose that (2.41) holds. Show that the function f :[p,1] > R defined by f(q) = 
log(Aq H24) is convex. Compute its derivative at 1 and deduce that (2.52) holds. 


3. Establish the reciprocal and show that E|e;|£ < oo for any q € [0, 2[. 


2.21 (On the predictive power of GARCH) 
In order to evaluate the quality of the prediction of e? obtained by using the volatility of a 
GARCH(1, 1) model, econometricians have considered the linear regression 


2 2 
e; =a + bo; + urs, 


where o? is replaced by the volatility estimated from the model. They generally obtained 
a very small determination coefficient, meaning that the quality of the regression was bad. 
It this surprising? In order to answer that question compute, under the assumption that E ef 


exists, the theoretical R? defined by 


2_ Var(o7) 
Var(€?) ` 


Show, in particular, that R? <1 [Kn 


Mixinge* 


It will be shown that, under mild conditions, GARCH processes are geometrically ergodic and 
B-mixing. These properties entail the existence of laws of large numbers and of central limit 
theorems (see Appendix A), and thus play an important role in the statistical analysis of GARCH 
processes. This chapter relies on the Markov chain techniques set out, for example, by Meyn and 
Tweedie (1996). 


3.1 Markov Chains with Continuous State Space 


Recall that for a Markov chain only the most recent past is of use in obtaining the conditional 
distribution. More precisely, (X;) is said to be a homogeneous Markov chain, evolving on a space 
E (called the state space) equipped with a o-field £, if for all x € E, and for all B € €, 


Vs,t EN, P(Xs+: € B | Xnr <s; X, =x) := P'(x, B). (3.1) 


In this equation, P'(x, B) corresponds to the transition probability of moving from the state x 
to the set B in ¢ steps. The Markov property refers to the fact that P‘ (x, B) does not depend on 
X,, r < s. The fact that this probability does not depend on s is referred to time homogeneity. For 
simplicity we write P(x, B) = P 1(x, B). The function P : E x € > [0, 1] is called a transition 
kernel and satisfies: 


(i) YB € £, the function P(-, B) is measurable; 
(ii) Yx € E, the function P(x, -) is a probability measure on (E, €). 


The law of the process (X,) is characterized by an initial probability measure jz and a transition 
kernel P. For all integers ¢ and all (t + 1)-tuples (Bo,..., B+) of elements of €, we set 


Pu (Xo € Bo, ..., Xı € Bi) 
=l i u(dxo)P (xo, dxı)... P (x-1, B;). (3.2) 
xoEBo X,-1€By_] 
In what follows, (X,) denotes a Markov chain on E = R? and £ is the Borel o-field. 


GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakotan 
© 2010 John Wiley & Sons, Ltd 


64 GARCH MODELS 


Irreducibility and Recurrence 


The Markov chain (X;) is said to be @-irreducible for a nontrivial (that is, not identically equal 
to zero) measure ¢ on (E, £), if 


VBe€E, o(B)>0 => YxeE, 3t>0, P(x, B)>0. 


If (X+) is d-irreducible, it can be shown that there exists a maximal irreducibility measure, that is, 
an irreducibility measure M such that all the other irreducibility measures are absolutely continuous 
with respect to M. If M(B) = 0 then the set of points from which B is accessible is also of zero 
measure (see Meyn and Tweedie, 1996, Proposition 4.2.2). Such a measure M is not unique, but 
the set 


Et ={BeE| M(B)>0} 


does not depend on the maximal irreducibility measure M. For a particular model, finding a 

measure that makes the chain irreducible may be a nontrivial problem (but see Exercise 3.1 for an 

example of a time series model for which the determination of such a measure is very simple). 
A ¢-irreducible chain is called recurrent if 


foe) 
U(x, B) := Pa, B) = +00, yxe E, WBe&t, 
t=1 


and is called transient if 


(Bj), E=()Bj, UG,B))<Mj<oo, Yee E. 
j 


Note that U(x, B) = E Yai 1ls(X;) can be interpreted as the average time that the chain spends 
in B when it starts at x. It can be shown that a @-irreducible chain (X,) is either recurrent or 
transient (see Meyn and Tweedie, 1996, Theorem 8.3.4). It is said that (X;) is positive recurrent if 


lim sup P'(x,B)>0, Vx € E,VBeé&*. 


t>co 


If a -irreducible chain is not positive recurrent, it is called null recurrent. For a -irreducible 
chain, positive recurrence is equivalent to the existence of a (unique) invariant probability measure 
(see Meyn and Tweedie, 1996, Theorem 18.2.2), that is, a probability 7 such that 


VBE, n(B)= [ro B)n(dx). 


An important consequence of this equivalence is that, for Markov time series, the issue of finding 
strict stationarity conditions reduces to that of finding conditions for positive recurrence. Indeed, 
it can be shown (see Exercise 3.2) that for any chain (X;) with initial measure ju, 


(X+) is stationary < pis invariant. (3.3) 


For this reason, the invariant probability is also called the stationary probability. 
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Small Sets and Aperiodicity 


For a @-irreducible chain, there exists a class of sets enjoying properties that are similar to those 
of the elementary states of a finite state space Markov chain. A set C € € is called a small set! if 
there exist an integer m > 1 and a nontrivial measure v on E such that 


Wx eC, YBEE, P(x, B) > v(B). 


In the AR(1) case, for instance, it is easy to find small sets (see Exercise 3.4). For more sophisti- 
cated models, the definition is not sufficient and more explicit criteria are needed. For the so-called 
Feller chains, we will see below that it is very easy to find small sets. For a general chain, we 
have the following criterion (see Nummelin, 1984, Proposition 2.11): C € E* is a small set if there 
exists A € €* such that, for all B C A, B e €*, there exists T > 0 such that 


T 
inf P'(x, B)>0. 
oe 


If the chain is ¢-irreducible, it can be shown that there exists a countable cover of E by small 
sets. Moreover, each set B € E* contains a small set C € €*. The existence of small sets allows 
us to define cycles for @-irreducible Markov chains with general state space, as in the case of 
countable space chains. More precisely, the period is the greatest common divisor (gcd) of the set 


{n>1|Vx eC, VB €€, P"(x, B) > 6, v(B), for some ôn > O}, 


where C € E* is any small set (the gcd is independent of the choice of C). When d = 1, the chain 
is said to be aperiodic. Moreover, it can be shown (see Meyn and Tweedie, 1996, Theorem 5.4.4.) 
that there exist disjoint sets D,,..., Da € E such that (with the convention Dg,; = D1): 


(i) Vi=1,...,d, Vx € Di, P(x, Dix1) = 1; 
(ii) (E — U Di) = 0. 


A necessary and sufficient condition for the aperiodicity of (X;) is that there exists A € E* such 
that for all B C A, B e E*, there exists tf > 0 such that 


P'(x,B)>0 and Pt!(x,B)>0 WxeB (3.4) 


(see Chan, 1990, Proposition A1.2). 


Geometric Ergodicity and Mixing 


In this section, we study the convergence of the probability P,,(X; €-) to a probability z(-) 
independent of the initial probability u, as t > oo. 
It is easy to see that if there exists a probability measure z such that, for an initial measure ju, 


YB €E, P(X, € B) > 2(B), whent > +o, (3.5) 


where P,,(X; € B) is defined in (3.2) (for (Bo,..., B;) = (E,..., E, B)), then the probability z 
is invariant (see Exercise 3.3). Note also that (3.5) holds for any measure yu if and only if 


YB €€, Vx € E, P'(x,B) > mz(B), whent > +o. 


'Meyn and Tweedie (1996) introduce a more general notion, called a ‘petite set’, obtained by replac- 
ing, in the definition, the transition probability in m steps by an average of the transition probabilities, 
bee, am P” (x, B), where (am) is a probability distribution. 
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On the other hand, if the chain is irreducible, aperiodic and admits an invariant probability x, for 
z-almost all x € E, 


|| P’(x,.)—2 ||> 0 when t > +00, (3.6) 


where ||- || denotes the total variation norm? (see Meyn and Tweedie, 1996, Theorem 14.0.1). 
A chain (X;,) such that the convergence (3.6) holds for all x is said to be ergodic. However, this 
convergence is not sufficient for mixing. We will define a stronger notion of ergodicity. 

The chain (X;) is called geometrically ergodic if there exists p € (0, 1) such that 


Vx € E, p™ || P’(x,.)—2z ||> 0 whent > +o. (3.7) 


Geometric ergodicity entails the so-called œ- and f-mixing. The general definition of the œ- and 
ß-mixing coefficients is given in Appendix A.3.1. For a stationary Markov process, the definition 
of the a-mixing coefficient reduces to 


ax (k) = |\Cov(f (Xo), 8(Xx))I. 
J8 


where the first supremum is taken over the set of the measurable functions f and g such that 
Ifl < 1, |g| < 1 (see Bradley, 1986, 2005). A general process X = (X;) is said to be a-mixing 
(B-mixing) if ax (k) (x (k)) converges to 0 as k —> ov. Intuitively, these mixing properties charac- 
terize the decrease in dependence when past and future become sufficiently far apart. The -mixing 
is sometimes called strong mixing, but 6-mixing entails strong mixing because ay (k) < x(k) (see 
Appendix A.3.1). 

Davydov (1973) showed that for an ergodic Markov chain (X;), of invariant probability 
measure 7, 


Bx(k) = I \| PEx, .) — x || x(dx). 


It follows that x(k) = O(p*) if the convergence (3.7) holds. Thus 


(X+) is stationary and geometrically ergodic 


= (X+) is geometrically 6-mixing. (3.8) 


Two Ergodicity Criteria 


For particular models, it is generally not easy to directly verify the properties of recurrence, 
existence of an invariant probability law, and geometric ergodicity. Fortunately, there exist simple 
criteria on the transition kernel. 

We begin by defining the notion of Feller chain. The Markov chain (X;) is said to be a 
Feller chain if, for all bounded continuous functions g defined on E, the function of x defined by 
E(g(X;)|X;-1 = x) is continuous. For instance, for an AR(1) we have, with obvious notation, 


E{g(X,) | Xi-1 = x} = E{g (0x + €)}. 


The continuity of the function x > g(0x + y) for all y, and its boundedness, ensure, by the 
Lebesgue dominated convergence theorem, that (X;) is a Feller chain. For a Feller chain, the 
compact sets C € E€* are small sets (see Feigin and Tweedie, 1985). 

The following theorem provides an effective way to show the geometric ergodicity (and thus 
the 6-mixing) of numerous Markov processes. 


? The total variation norm of a (signed) measure m is defined by || || = sup f fdm, where the supremum 
is taken over {f : E — R, f measurable and |f| < 1}. 
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Theorem 3.1 (Feigin and Tweedie (1985, Theorem 1)) Assume that: 
(i) (X+) is a Feller chain; 
(ii) (X+) is @-irreducible; 


(iii) there exist a compact set A C E such that ġ (A) > 0 and a continuous function V : E > 
R* satisfying 


Vix) >1, Vx eA, (3.9) 
and for 5 > 0, 
E {V(X;)|Xi-1 =x} < (1 —8)V(x), Yx GA. (3.10) 
Then (X;) is geometrically ergodic. 


This theorem will be applied to GARCH processes in the next section (see also Exercise 3.5 
for a bilinear example). In equation (3.10), V can be interpreted as an energy function. When the 
chain is outside the center A of the state space, the energy dissipates, on average. When the chain 
lies inside A, the energy is bounded, by the compactness of A and the continuity of V. Sometimes 
V is called a test function and (iii) is said to be a drift criterion. 

Let us explain why these assumptions imply the existence of an invariant probability measure. 
For simplicity, assume that the test function V takes its values in [1, +00), which will be the 
case for the applications to GARCH models we will present in the next section. Denote by P 
the operator which, to a measurable function f in E, associates the function Pf defined by 


vx € E, Prey = f FOP, dy) = EF) | Ximi =a). 
E 


Let P’ be the żth iteration of P, obtained by replacing P(x, dy) by P‘(x, dy) in the previous inte- 
gral. By convention P? f = f and P°(x, A) = 11,4. Equations (3.9) and (3.10) and the boundedness 
of V by some M >0 on A yield an inequality of the form 

PV <(-45)V+b14 
where b = M — (1 — ô)., Iterating this relation ¢ times, we obtain, for x9 € A 


¥t>0, Pt! V(x) < (1 — 8)P' V (xo) + DP! (x0, A). (3.11) 


It follows (see Exercise 3.6) that there exists a constant xk > 0 such that for n large enough, 
1 n : 
On(xo, A) =« where Qn(xo, A) = — È P' (x0, A). (3.12) 
n 
t=1 


The sequence Q, (xo, -) being a sequence of probabilities on (E, E£), it admits an accumulation 
point for vague convergence: there exist a measure z of mass less than | and a subsequence (ng) 
such that for all continuous functions f with compact support, 


lim f FO 0n (00. dy) = f fO)r(dy). (3.13) 
k—=>œ E E 


In particular, if we take f = l4 in this equality, we obtain r(A) > x, thus z is not equal to 
zero. Finally, it can be shown that z is a probability and that (3.13) entails that x is an invariant 
probability for the chain (X;) (see Exercise 3.7). 
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For some models, the drift criterion (iii) is too restrictive because it relies on transitions in 
only one step. The following criterion, adapted from Meyn and Tweedie (1996, Theorems 19.1.3, 
6.2.9 and 6.2.5), is an interesting alternative relying on the transitions in n steps. 

Theorem 3.2 (Geometric ergodicity criterion) Assume that: 
(i) (X+) is an aperiodic Feller chain; 
(ii) (X;) is @-irreducible where the support of ¢ has nonempty interior; 


(iii) there exist a compact C C E, an integer n > 1 and a continuous function V : E > Rt 


satisfying 
1< V(x), Vx EC, (3.14) 
and for 5 >0 and b >Q, 
E{V(Xi4n)|Xr-1 = x} < (1 — ô) V (x), Vx ¢ C, (3.15) 
E{V(Xiin)|Xi-1 = x} < b, Vx EC. 


Then (X;) is geometrically ergodic. 


The compact C of condition (iii) can be replaced by a small set but the function V must be 
bounded on C. When (X;) is not a Feller chain, a similar criterion exists, for which it is necessary 
to consider such small sets (see Meyn and Tweedie, 1996, Theorem 19.1.3). 


3.2 Mixing Properties of GARCH Processes 


We begin with the ARCH(1) process because this is the only case where the process (e+) is 
Markovian. 


The ARCH(1) Case 


Consider the model 


| Er = Om (3.16) 


2 2 
of =w +E], 


where w > 0, œ > 0 and (n+) is a sequence of iid (0, 1) variables. The following theorem establishes 
the mixing property of the ARCH(1) process under the necessary and sufficient strict stationarity 
condition (see Theorem 2.1 and (2.10)). An extra assumption on the distribution of 7; is required, 
but this assumption is mild: 


Assumption A The law P, of the process (n;) is absolutely continuous, of density f with respect 
to the Lebesgue measure à on (R, B(R)). We assume that 


inf{n | n> 0, f(n) > 0} = inf{—n | n < 0, f(n) > 0} := n°, (3.17) 
and that there exists t > 0 such that 
(=n? — t, —n°) U (n°, n° +7) c {f > 0}. 


Note that this assumption includes, in particular, the standard case where f is positive over a 
neighborhood of 0, possibly over all R. We then have n° = 0. Equality (3.17) implies some (local) 
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symmetry of the law of (7,). This symmetry facilitates the proof of the following theorem, but it 
can be omitted (see Exercise 3.8). 


Theorem 3.3 (Mixing of the ARCH(1) model) Under Assumption A and for 
a< eTE logy | (3.18) 


the nonanticipative strictly stationary solution of the ARCH(1) model (3.16) is geometrically 
ergodic, and thus geometrically B-mixing. 


Proof. Let w(x) = (œ + ax’)!/?. A process (e;) satisfying 


&=wWwe- vm, t=, 


where n; is independent of €,_;, i > 0, is clearly a homogenous Markov chain on (R, B(R)), with 
transition probabilities 


P(x, B) = P(e eBla=o=f, 
W(x) 


dP,(y). 
B 


We will show that the conditions of Theorem 3.1 are satisfied. 


Step (i). We have 
E {g(&) | @-1 =x} = E[g{w(x)m}] 


If g is continuous and bounded, the same is true for the function x > g{w(x)y}, for all y. By the 
Lebesgue theorem, it follows that (€,) is a Feller chain. 


Step (ii). To show the ¢-irreducibility of the chain, for some measure ¢, assume for the moment 
that 7° = 0 in Assumption A. Suppose, for instance, that f is positive on [0, t). Let @ be the 
restriction of the Lebesgue measure to the interval [0, Vært). Since w(x) > væ, it can be seen 
that 


o(B)>0 > »{ an wo} =o => P(x, B)>0. 
W(x) 


It follows that the chain (€,) is d-irreducible. In particular, @ = A if n, has a positive density 
over R. 
The proof of the irreducibility in the case 7° > 0 is more difficult. First note that 


E logan; = i log(ax?) f (x)dA(x) > log a(n°)’. 
(—00,—n°] U [7°,-+00) 


Now E logan? < 0 by (3.18). Thus we have 
p:=a(n’)?? < 1. 
Let t’ € (0, t) be small enough such that 
pi =al tty <1. 
Iterating the model, we obtain that, for €o = x fixed, 


2 2 D2 -1,2 2 2 22 
€ = on; +an n i te +a my) Ha N; -nix ; 
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It follows that the function 
2 2 2 2 2 
VY, = (hisss) => Ze= Nise es MR Er) 


is a diffeomorphism between open subsets of R’. Moreover, in view of Assumption A, the vector 
Y, has a density on R‘. The same is thus true for Z,, and it follows that, given €o = x, 


the variable e has a density with respect to 2. (3.19) 
We now introduce the event 


t 
2, = N {ns € (—n? =T, =n?) U De no + t']}. (3.20) 


s=1 


Assumption A implies that P(&,) > 0. Conditional on &,, we have 


1— ø 1— ø 
e eEh:= [oo +p'x?, on + pix?] 
l= l= pi 
Since the bounds of the interval J, are reached, the intermediate value theorem and (3.19) entail 
that, given €9 = x, e has, conditionally on &;, a positive density on J,. It follows that 
€, has, conditionally on &;, a positive density on J; (3.21) 
where J, = {x € R | x? € L}. Let 


= pa omn? +c 


; |: J={xeR| xe} 
Lip 1— pi 


and let A, be the restriction of the Lebesgue measure to J. We have 
àJ(B)>0 > Jat, ABN J) >0 
=> dt, P(e, € B|eo = x) > P(e, € B | (€o = x) N G,)P(G;) > 0. 
The chain (e+) is thus -irreducible with ọ = 47. 


Step (iii). We shall use Lemma 2.2. The variable an? is almost surely positive and satisfies 
E(an?) =a < œ and E logan? <0, in view of assumption (3.18). Thus, there exists s >0 
such that 

c := ah po, <1, 


where Mos = En. The proof of Lemma 2.2 shows that we can assume s < 1. Let V(x) = 1+ x”, 
Condition (3.9) is obviously satisfied for all x. Let 0 < 5 < 1 — c and let the compact set 


A= {x € R; œ us +8 + (c — 1+ 8)x” > 0}. 


Since A is a nonempty closed interval with center 0, we have @(A) > 0. Moreover, by the inequality 
(a+b) <a + b° fora,b > Oands € [0, 1] (see the proof of Corollary 2.3), we have, for x ¢ A, 


E[V (€)le1 = x] < 1+ (@ +x”) prs 
= 1 +o uz + cx 
< (1 — ô)V (x), 


which proves (3.10). It follows that the chain (€+) is geometrically ergodic. Therefore, in view of 
(3.8), the chain obtained with the invariant law as initial measure is geometrically 6-mixing. The 
proof of the theorem is complete. 
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Remark 3.1 (Case where the law of 7; does not have a density) The condition on the density 
of 7; is not necessary for the mixing property. Suppose, for example, that n? = 1, a.s. (that is, 
n: takes the values —1 and 1, with probability 1/2). The strict stationarity condition reduces to 


a < | and the strictly stationary solution is €; = ,/7277;, a.s. This solution is mixing since it is 


an independent white noise. 

Another pathological example is obtained when n; has a mass at 0: P(n; = 0) = 6 > 0. Regard- 
less of the value of a, the process is strictly stationary because the right-hand side of inequality 
(3.18) is equal to +00. A noticeable feature of this chain is the existence of regeneration times 
at which the past is forgotten. Indeed, if n; = 0 then €; = 0, €41 = /on;41,.... It is easy to see 
that the process is then mixing, regardless of a. 


The GARCH(1, 1) Case 
Let us consider the GARCH(1, 1) model 


E = Ort 
| o? =w + ae? + Clee 63-22) 


where w > 0, a > 0, 6 > 0 and the sequence (n+) is as in the previous section. In this case (o+) is 
Markovian, but (€;) is not Markovian when £ > 0. The following result extends Theorem 3.3. 


Theorem 3.4 (Mixing of the GARCH(1, 1) model) Under Assumption A and if 


E log(an? + B) < 0, (3.23) 


then the nonanticipative strictly stationary solution of the GARCH(1, 1) model (3.22) is such that 
the Markov chain (0+) is geometrically ergodic and the process (€;) is geometrically B-mixing. 


Proof. If a = 0 the strictly stationary solution is iid, and the conclusion of the theorem follows 
in this case. We now assume that œ > 0. We first show the conclusions of the theorem that con- 
cern the process (o;). A homogenous Markov chain (0+) is defined on (Rt, B(R™)) by setting, 
fort > 1, 


o? = ot a(m—1)07 1, (3.24) 


where a(x) = ax? + B. Its transition probabilities are given by 


Yx>0,YB € BR), P(x, B) = P(o, € B | o =x) af dP,(y), 


x 


where B, = {n; {œ + a(n)x7}!/? € B}. We show the stated results by checking the conditions of 
Theorem 3.1. 


Step (i). The arguments given in the ARCH(1) case, with 
E {g(or) | o1 = x} = E[g{(@ + a(ny)x2)"4] 


are sufficient to show that (o+) is a Feller chain. 
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Step (ii). To show the irreducibility, note that (3.23) implies 
p:=a(n’) <1, 
since || > 7° a.s. and a(-) is an increasing function. Let t’ € (0, t) be small enough such that 
pi := a(n? +r) <1. 
If o9 = x € R*, we have, for t > 0, 
of = oll + a(-1) + a14 M2) ++ + a1) -a + a1) + a00). 


Conditionally on &;, defined in (3.20), we have 
2 w efa w w r(.2 w 
o, El, =| —— F (2-2) f+ (2- Ji: 
= f= l-p)’ 1-p, °! 1— p1 


—— (02) (0) 
I= lim/, = | —, ; 
too l-p 1-p 


Let 


Then, given o9 = x, 
o; has, conditionally on &;, a positive density on J; 
where J, = {x € Rt |x? €I} Let A, be the restriction of the Lebesgue measure to 
=| 8. JT |. We have 
A7(B)>0=> 3t, ABN J) >0 
=> dt, P(o, € Bloo =x) > P(o, € B | (o0 = x) N G;)PCE;) > 0. 
The chain (o+) is thus ġ-irreducible with @ = À;. 


Step (iii). We again use Lemma 2.2. By the arguments used in the ARCH(1) case, there exists 
s € [0, 1] such that 
cı = E{a(m-—1)"} < 1. 


Define the test function by V(x) = 1+ x let0<5 <1-— cı and let the compact set 
A={xeERt; wo +64+ (cy —148)x* > O}. 
We have, for x ¢ A, 
E[V(o,)|o,-1 =x] < 1+0 cx” 
< (1—8) V (x), 


which proves (3.10). Moreover (3.9) is satisfied. 
To be able to apply Theorem 3.1, it remains to show that ¢ (A) >0 where ¢ is the above 
irreducibility measure. In view of the form of the intervals Z and A, it is clear that, denoting by 


A the interior of A, 
o(A)>0e | cA 
l1—p 


eo'+5+a-148(2) > 0, 
-=p 
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Therefore, it suffices to choose ô sufficiently close to 1 — cı so that the last inequality is satisfied. 
For such a choice of 5, the compact set A satisfies the assumptions of Theorem 3.1. Consequently, 
the chain (o+) is geometrically ergodic. Therefore the nonanticipative strictly stationary solution 
(o+), satisfying (3.24) for t € Z, is geometrically B-mixing. 


Step (iv). Finally, we show that the process (e+) inherits the mixing properties of (o+). Since 
€t = Orm, it is sufficient to show that the process Y, = (o;,7,)' enjoys the mixing property. It 
is clear that (Y,) is a Markov chain on R* x R equipped with the Borel o-field. Moreover, 
(Y;) is strictly stationary because, under condition (3.23), the strictly stationary solution (0+) is 
nonanticipative, thus Y, is a function of ;, 7;-1,.... Moreover, o, is independent of 7,. Thus the 
stationary law of (Y;) can be denoted by Py = Po @ P, where P, denotes the law of o, and P, that 
of n;. Let P‘ (y, -) the transition probabilities of the chain (Y,). We have, for y = (y1, y2) € R* x R, 
Bı € B(R*), B2 € B(R) and t > 0, 


P'(y, By x B2) = P(o; € Bi, m € B2 | o0 = y1, No = y2) 


= P,(B2)P(o; € Bı | oo = y1, No = y2) 


= P ,(B2)P(0; € Bı | o1 = w + a(y2)y1) 
= P,(B2)P™!(@ + a2)yı, Bi). 


It follows, since P, is a probability, that 
IPO, =PO =P @ +aQ2)y1, = POl 
The right-hand side converges to 0 at exponential rate, in view of the geometric ergodicity of (o+). 


It follows that (Y;) is geometrically ergodic and thus 6-mixing. The process (e+) is also B-mixing, 


since €; is a measurable function of Y,. 
O 


Theorem 3.4 is of interest because it provides a proof of strict stationarity which is completely 
different from that of Theorem 2.8. A slightly more restrictive assumption on the law of 7; has 
been required, but the result obtained in Theorem 3.4 is stronger. 


The ARCH(q) Case 


The approach developed in the case q = 1 does not extend trivially to the general case because 
(€+) and (o;) lose their Markov property when p > 1 or q > 1. Consider the model 


Er = Ott 
| op = w+ VL tE vo 


where œw >Q, a; > 0, i =1,...,q, and (7;) is defined as in the previous section. We will once 
again use the Markov representation 


Z, = b, + Atz (3.26) 


where 


2 2 
Œi: g— a 
A,=( a, ae b= (on? Darg Oy 2S reve gay) 


Recall that y denotes the top Lyapunov exponent of the sequence {A;, t € Z}. 
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Theorem 3.5 (Mixing of the ARCH(qg) model) Jf n, has a positive density on a neighborhood 
of 0 and y < 0, then the nonanticipative strictly stationary solution of the ARCH(q) model (3.25) 
is geometrically B-mixing. 


Proof. We begin by showing that the nonanticipative and strictly stationary solution (z,) of the 

model (3.26) is mixing. We will use Theorem 3.2 because a one-step drift criterion is not sufficient. 
Using (3.26) and the independence between n; and the past of z,, it can be seen that the process 

(z,) is a Markov chain on (Rt)4 equipped with the Borel o-field, with transition probabilities 


Vx e (R*)1, VB € B((R*)4), P(x, B) = P(b; + Aix € B). 


The Feller property of the chain (z,) is obtained by the arguments employed in the ARCH(1) 
and GARCH(1, 1) cases, relying on the independence between 7, and the past of z,, as well as on 
the continuity of the function x —> b, + Arx. 

In order to establish the irreducibility, let us consider the transitions in q steps. Starting from 
Zo =X, after q transitions the chain reaches a state Z4 of the form 


na Wa Mi» ee’ 1, x) 
zy = : 
ni (x) 
where the functions y; are such that w;(-) > œ > 0. Let t > 0 be such that the density f of n, be 
positive on (—T, T), and let @ be the restriction to [0, or [i of the Lebesgue measure à on R4. 
It follows that, for all B = Bı x--- x By € B((R*)1), ¢(B)>0 implies that, for all 
xX, Yi; ---, Yq € (R), and for all i = 1,...,q, 


1 
À — N [0, | > 0, 
Pai isses Vim X) 


which implies in turn that, for all x € (R*)?, P41 (x, B) >0. We conclude that the chain (z,) is 
¢-irreducible. 
The same argument shows that 


$(B)>0=> Vk >0,¥x e (R)I, PIT, B)> 0. 


The criterion given in (3.4) can then be checked, which implies that the chain is aperiodic. 
We now show that condition (iii) of Theorem 3.2 is satisfied with the test function 


V@ =1+IxIl, 
where ||- || denotes the norm ||A]| = >> |Aj;| of a matrix A = (A;;) and s € (0, 1) is such that 
p = E(\AkyAtp-1--- Ail?) <1 


for some integer ko > 1. The existence of s and ko is guaranteed by Lemma 2.3. Iterating (3.26), 
we have 


ko—2 
Zy = Big + J Ar- Atty-tBig 4-1 + Ato «+ AZo: 
k=0 


The norm being multiplicative, it follows that 


ko—2 


lze lI” < lbk lI” + Yo IAk- Aroel bro- l? + IAko ©» Aull Izol. 
k=0 
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Thus, for all x € (R*)?+4, 


ko—2 
EV Eu) |Zo =X) < 1+ Elby l? + JL Ell gy - - - Angell Elbi- ll? + ollzll? 
k=0 


= K + pllxll’. 


The inequality comes from the independence between A; and b,_; for i > 0. The existence of the 
expectations on the right-hand side of the inequality comes from arguments used to show (2.33). 
Let 5 >0 such that 1 — ô > p and let C be the subset of (R*)?*4 defined by 


C= {x | A- ô- plx? < K -— (1 -— ô). 


We have C # Ø because K > 1 — 5. Moreover, C is compact because | — ô — p > 0. Condition 
(3.14) is clearly satisfied, V being greater than 1. Moreover, (3.15) also holds true for n = kọ — 1. 
We conclude that, in view of Theorem 3.2, the chain Z is geometrically ergodic and, when it is 
initialized with the stationary measure, the chain is stationary and 6-mixing. 

Consequently, the process (€?), where (e+) is the nonanticipative strictly stationary solution 
of model (3.25), is B-mixing, as a measurable function of z,. This argument is not sufficient to 
conclude concerning (€;). For k > 0, let 


Yo = f(...,€1, €0), Zk = B(Eks Cet1s +++) 
where f and g are measurable functions. Note that 
E(Yo | €f, t € Z) = E(Yo | €f, t < 0, n, u > 1) = E(Yo | €f, t < 0). 


Similarly, we have E (Z; | Efst €Z) = E(Z, | de t > k), and we have independence between Yo 
and Z, conditionally on (€). Thus, we obtain 
Cov(Yo, Ze) = E(YoZe) — E(Yo)E(Zx) 
= E{E(YoZ, | €, t € Z)} 
—E{E(¥o | €, t € Z)} E{E(Z | €f, t € Z)} 
= E{E(% | €7,t < OE(Z | t = K) 
-E{E(Yo | €f, t < 0)} E{E(Z; | 7, > W} 
= Cov{E (Yo | €f, t < 0), E(Zp | €f, t > k)} 


= Cov{ fil., E G), gR, eh D 
It follows, in view of the definition (A.5) of the strong mixing coefficients, that 
Qe(k) < a(k). 


In view of (A.6), we also have 6, (k) < e2 (k). Actually, (A.7) entails that the converse inequalities 
are always true, so we have a,2(k) = &œe(k) and B.2(k) = Pe(k). The theorem is thus shown. 
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3.3 Bibliographical Notes 


A major reference on ergodicity and mixing of general Markov chains is Meyn and Tweedie (1996). 
For a less comprehensive presentation, see Chan (1990), Tjøstheim (1990) and Tweedie (1998). 
For survey papers on mixing conditions, see Bradley (1986, 2005). We also mention the book by 
Doukhan (1994) which proposes definitions and examples of other types of mixing, as well as 
numerous limit theorems. 

For vectorial representations of the form (3.26), the Feller, aperiodicity and irreducibility prop- 
erties were established by Cline and Pu (1998, Theorem 2.2), under assumptions on the error 
distribution and on the regularity of the transitions. 

The geometric ergodicity and mixing properties of the GARCH(p, q) processes were estab- 
lished in the PhD thesis of Boussama (1998), using results of Mokkadem (1990) on polynomial 
processes. The proofs use concepts of algebraic geometry to determine a subspace of the states 
on which the chain is irreducible. For the GARCH(1, 1) and ARCH(q) models we did not need 
such sophisticated notions. The proofs given here are close to those given in Francq and Zakoian 
(2006a), which considers more general GARCH(1, 1) models. Mixing properties were obtained by 
Carrasco and Chen (2002) for various GARCH-type models under stronger conditions than the strict 
stationarity (for example, a + 6 < 1 for a standard GARCH(1, 1); see their Table 1). Recently, 
Meitz and Saikkonen (2008a, 2008b) showed mixing properties under mild moment assumptions 
for a general class of first-order Markov models, and applied their results to the GARCH(1, 1). 

The mixing properties of ARCH(oo) models are studied by Fryzlewicz and Subba Rao (2009). 
They develop a method for establishing geometric ergodicity which, contrary to the approach of 
this chapter, does not rely on the Markov chain theory. Other approaches, for instance developed 
by Ango Nze and Doukhan (2004) and Hörmann (2008), aim to establish probability properties 
(different from mixing) of GARCH-type sequences, which can be used to establish central limit 
theorems. 


3.4 Exercises 


3.1 (Irreducibility condition for an AR(1) process) 
Given a sequence (£+)ren of iid centered variables of law P, which is absolutely continuous 
with respect to the Lebesgue measure à on R, let (X;)ren be the AR(1) process defined by 


X: =0Xmi te, t>O0 


where 6 € R. 
(a) Show that if P, has a positive density over R, then (X;) constitutes a -irreducible chain. 


(b) Show that if the density of £; is not positive over all R, the existence of an irreducibility 
measure is not guaranteed. 


3.2 (Equivalence between stationarity and invariance of the initial measure) 
Show the equivalence (3.3). 


3.3 (Invariance of the limit law) 
Show that if x is a probability such that for all B, P,,(X; € B) > 2(B) when t — ov, then 
x is invariant. 


3.4 (Small sets for AR(1)) 
For the AR(1) model of Exercise 3.1, show directly that if the density f of the error term is 
positive everywhere, then the compacts of the form [—c, c], c > 0, are small sets. 


3.5 


3.6 


3.7 


3.8 


3.9 


3.10 


3.11 
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(From Feigin and Tweedie, 1985) 
For the bilinear model 


X;=OX+-1 + be:Xi-1 +8, t20, 
where (€;) is as in Exercise 3.1(a), show that if 
E| + be,| < 1, 


then there exists a unique strictly stationary solution and this solution is geometrically ergodic. 


(Lower bound for the empirical mean of the P' (xo, A)) 
Show inequality (3.12). 


(Invariant probability) 

Show the invariance of the probability x satisfying (3.13). 

Hints: (i) For a function g which is continuous and positive (but not necessarily with compact 
support), this equality becomes 


timint f gO) Qn, Xo, dy) > f somay) 
k>oo JE E 


(see Meyn and Tweedie, 1996, Lemma D.5.5). 
(ii) For all o-finite measures u on (IR, B(R)) we have 


VB € B(R), mw(B) =sup{u(C); C C B, C compact} 


(see Meyn and Tweedie, 1996, Theorem D.3.2). 


(Mixing of the ARCH(1) model for an asymmetric density) 

Show that Theorem 3.3 remains true when Assumption A is replaced by the following: 
The law P, is absolutely continuous, with density f, with respect to à. There exists t > 0 
such that 


(n? —1,n2) U (nf, nb +t) c {f >}, 


where 7° = sup{n | n < 0, f(n) > 0} and n? = inf{n | n > 0, f(n) > 0}. 


(A result on decreasing sequences) 
Show that if un is a decreasing sequence of positive real numbers such that }_„ un < Oo, we 
have sup,, nun < oo. Show that this result applies to the proof of Corollary A.3 in Appendix A. 


(Complements to the proof of Corollary A.3) 
Complete the proof of Corollary A.3 by showing that the term d4 is uniformly bounded in ż, 
h and k. 


(Nonmixing chain) 
Consider the nonmixing Markov chain defined in Example A.3. Which of the assumptions 
(i)-(ii) in Theorem 3.1 does the chain satisfy and which does it not satisfy? 


Temporal Aggregation and Weak 
GARCH Models 


Most financial series are analyzed at different frequencies (daily, weekly, monthly, ...). The 
properties of a series and, as a consequence, of the model fitted to the series, often crucially 
depend on the observation frequency. For instance, empirical studies generally find a stronger 
persistence (that is, œ + £ closer to 1) in GARCH(1, 1) models, when the frequency increases. 

For a given asset, observed at different frequencies, a natural question is whether strong 
GARCH models at different frequencies are compatible. If the answer is positive, the class of 
GARCH models will be called stable by temporal aggregation. In this chapter, we consider, 
more generally, invariance properties of the class of GARCH processes with respect to time 
transformations frequently encountered in econometrics. It will be seen that, to obtain stability 
properties, a wider class of GARCH-type models, called weak GARCH and based on the L? 
structure of the squared returns, has to be introduced. 


4.1 Temporal Aggregation of GARCH Processes 


Temporal aggregation arises when the frequency of data generation is lower than that of the 
observations so that the underlying process is only partially observed. The time series resulting 
from temporal aggregation may of course have very different properties than the original time 
series. More formally, the temporal aggregation problem can be formulated as follows: given 
a process (X;) and an integer m, what are the properties of the sampled process (X,,,;) (that is, 
constructed from (X+) by only keeping every mth observation)? Does the aggregated process (X m;) 
belong to the same class of models as the original process (X;)? If this holds for any (X;) and 
any m, the class is said to be stable by temporal aggregation. 

An elementary example of a model that is stable by temporal aggregation is obviously white 
noise (strong or weak): the independence (or noncorrelation) property is kept for the aggregated 
process, as well as the property of zero mean and fixed variance. On the other hand, ARMA 
models in the strong sense are generally not stable by temporal aggregation. It is only by relaxing 
the assumption of noise independence, that is, by considering the class of weak ARMA models, 
that temporal aggregation holds. 
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We shall see that, like many strong models (based on an iid white noise), GARCH models in the 
strong or semi-strong sense (that is, in the sense of Definition 2.1), are not stable by aggregation: a 
GARCH model at a given frequency is not compatible with a GARCH model at another frequency. 
As for ARMA models, temporal aggregation is obtained by enlarging the class of GARCH. 


4.1.1 Nontemporal Aggregation of Strong Models 


To show that temporal aggregation does not hold for GARCH models, it suffices to consider the 
ARCH(1) example. Let (€,) be the nonanticipative, second-order solution of the model: 


e = {ota} mh, wo>0, O<a<1, (m) iid (0,1), Ent) = u4 < o. 
The model satisfied by the even-numbered observations is easily obtained: 
ex = {0 (1 + a5, 1) +8 Eien} na. 
It follows that 


E (exlE20-1); €2¢-2),---) = 0 
Var (€2;|€2¢—1), €2¢—-2),---) = @(1 +a) + Oe ain 


because 2, and 72;—-; are independent of the variables involved in the conditioning. Thus, the pro- 
cess (€2;) is ARCH(1) in the semi-strong sense (Definition 2.1). It is a strong ARCH if the process 
defined by dividing €2, by its conditional standard deviation, 


a €2t 
be gg, me a aod 
lo +a) + 076), nh 


is iid. We have seen that E (7;|€2¢—1), €20-2), - - -) = 0 and E(#@leoa-, €2(1-2),---) = 1, but 


E(itleoa—1), €¢-2), --) 


a (1 + 2a + a7 14) + 20(1 apa)? Eg + uaa Ehan 
{o(1 +a) + a€2,_1))? 


=m |1 eos Daw + we)” . 
{od +a) + a?eea) 


= M4 


If E (Ñf le24-1), €2(1-2),---) were a.s. constant, we would have œ = 0 (no ARCH effect), or u4 = 1 
(n? = 1, a.s.), or, for some constant K, 


2 
@ + E541) 
olearia e. a = K, a.s., 
o(l+a)+a 4-1) 


the latter inequality implying that Sui) = K*, a.s., for another constant K*. By stationarity 
e? = K*, a.s., for all £ and n? = ew + ae? _,}! would take only one value, leading us again to 
the case u4 = 1. This proves that the process (ñ+) is not iid, whence a > 0 (presence of ARCH), 
and u4 ~ 1 (nondegenerate law of n2). The process (€2;) is thus not strong GARCH, although (€;) 
is strong GARCH. 

It can be shown that this property extends to any integer m (Exercise 4.1). 

From this example, it might seem that strong GARCH processes aggregate in the class of 
semi-strong GARCH processes. We shall now see that this is not the case. 
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4.1.2 Nonaggregation in the Class of Semi-Strong 
GARCH Processes 


Let (€,) denote the nonanticipative, second-order stationary solution of the strong ARCH(2) model: 
= 2 2 41/2 
€ ={wm+aje_, FA} N, @,01,02>0, a +a <l, 


under the same assumptions on (n+) as before. In view of (2.4), the AR(2) representation satisfied 
by (e?) is 


2 = w+ ae? +a + vr (4.1) 
where (v;) is the strong innovation of (€). Using the lag operator, this model is written as 
(l—AyL) aLe? = w+, 


where A; and Az are real positive numbers (such that A; — Az = a; and à1à2 = a2). Multiplying 
this equation by (1 + A, Z)(1 — A2L), we obtain 


(1 =A LAA — ASL? )e? = w(1 + Ag) — An) + A ALA — Ag), 


that is, 
(AFL) — ABL)y? = o" + v, 


where w* = w(1 +A,)(1 — Aa), vy = va + (Ay — Az) v24~-1 — AyA2V27—-2 and y; = €z. Observe that 
(v;) is an MA(1) process, such that 


Cov(u;, v;—-1) 
= Cov {vx + (Ay — A2)v24—-1 — A1À2Var—2, Var—2 + (A1 — à2)Var-3 — At A2V24—4} 
= —)4A2Var(1;). 


It follows that (v,;) can be written as v; = us — Ou; 1}, where (u+) is a white noise and @ is a 
constant depending on A, and Ap. Finally, y? = GA has the following ARMA (2, 1) representation: 


jy = OF + (AT FADE G1) — ATAZESG¢—2) + Mr — Ouri. (4.2) 


The ARMA orders are compatible with a semi-strong GARCH(1, 2) model for (€2;),;, with condi- 
tional variance: 


DE 2) 22 2 
o; = Var(es, | €2¢—1)» 20-2) ++») 


~ ~ a an m m ia 5 
= Õ + W541) + Õe- + ofi, G>0,% > 0, å > 0, > 0. 


If (€21); were such a semi-strong GARCH(1, 2) process, the corresponding ARMA (2, 1) represen- 
tation would then be 


2 _-» z Ry -2 > 2 = Bam 
Ey = OF (1 + B)exq—1) + O2€5¢_2) + P — BY -1, 


in view of (2.4). This equation is not compatible with (4.2), because of the sign of the coefficient 
of 3029)" We can conclude that if (€,) is a strong ARCH(2), (€2;) is never a semi-strong GARCH. 
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4.2 Weak GARCH 


The previous example shows that the square of a process obtained by temporal aggregation of 
a strong or semi-strong GARCH admits an ARMA representation. This leads us to the follow- 
ing definition. 


Definition 4.1 (Weak GARCH process) A fourth-order stationary process (€,) is said to be a 
weak GARCH(r, p) if: 


(i) (€;) is a white noise; 
(ii) (e?) admits an ARMA representation of the form 


r 


p 
e? — Yace =c+y- X bivi (4.3) 


i=l i=l 
where (v,;) is the linear innovation of (€?). 


Recall that the property of linear innovation entails that 
Cov(y;,€2,) =0, Vk>0. 


From (2.4), semi-strong GARCH(p, q) processes satisfy, under the fourth-order stationarity con- 
dition, Definition 4.1 with r = max(p, q). The linear innovation coincides in this case with the 
strong innovation: v; is thus uncorrelated with any variable of the past of €; (provided this corre- 
lation exists). 


Remark 4.1 (Generality of the weak GARCH class) Let (X;) denote a strictly stationary, 
purely nondeterministic process, admitting moments up to order 4. By the Wold theorem, (X;) 
admits an MA(oo) representation. Suppose this representation is invertible and there exists an 
ARMA representation of the form 


Ẹ Q 
Xr+ X biX-i =e + DD. Wiet—i 
i=l i=l 


where (€;) is a weak white noise with variance o? >0, and the polynomials ®(z) = 1+ 
giz +- +opz? and Wz) =14+wizt---+ Woz? have all their roots outside the unit disk 
and have no common root. Without loss of generality, suppose that øp #0 and Wo #0 (by 
convention, ¢9 = Wo = 1). The process (€,) can then be interpreted as the linear innovation 
of (X+). The process (€7)rez is clearly second-order stationary and purely nondeterministic. 
It follows that it admits an MA(oo) representation by the Wold theorem. If this representation is 
invertible, the process (€+) is a weak GARCH process. 


The class of weak GARCH processes is not limited to processes obtained by temporal aggregation. 
Before returning to temporal aggregation, we conclude this section with further examples of weak 


GARCH processes. 


Example 4.1 (GARCH with measurement error) Suppose that a GARCH process (€,) is 
observed with a measurement error W;. We have 


q p’ 
=e tW, e =Z, of =e+) ayer, +> biok, (4.4) 
i=1 i=l 
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For simplicity, it can be assumed that the sequences (Z;) and (W,) are mutually independent, iid 
and centered, with variances | and ow respectively. 
It can be shown (Exercise 4.3) that (€,) is a weak GARCH process of the form 


max{p,q} max{p,q} max{p.q} 
e= > (a; + bi); =c+ (:- > an) oe, + u, + D iuti 


i=l i=] i=1 


where the f; are different from the —b;, unless ow = 0. It is worth noting that the AR part of this 
representation is not affected by the presence of the perturbation W,. 

Statistical inference on GARCH with measurement errors is complicated because the likelihood 
cannot be written in explicit form. Methods using least squares, the Kalman filter or simulations 
have been suggested to estimate these models. 


Example 4.2 (Quadratic GARCH) Consider the modification of the semi-strong GARCH 
model given by 


q 2p 
E(e;|€:-1) = 0 and E(e7le-1) = o? = (« + Ya] + > biożi, (4.5) 
i=1 i=l 


where the constants b; are positive. Let u; = e — ara The u; are nonautocorrelated, uncorrelated 


with any variable of the future (by the martingale difference assumption) and, by definition, with 
any variable of the past of €;. Rewrite the equation for oa? as 
max{p.q} 
Gal+ Yo @ tbe tu, 


i=l 
where 


4q P 


v= 2c D> diemi + X aia je1-i€1-j + u;— X biui. (4.6) 


isl iAj i=l 


It is not difficult to verify that (v,;) is an MA(max{p, q}) process (Exercise 4.4). It follows that 
(€+) is a weak GARCH(max{p, q}, max{p, q}) process. 


Example 4.3 (Markov-switching GARCH) Markov-switching models (ARMA, GARCH) 
allow the coefficients to depend on a Markov chain, in order to take into account the changes of 
regime in the dynamics of the series. The chain being unobserved, these models are also referred 
to as hidden Markov models. 

The simplest Markov-switching GARCH model is obtained when a single parameter w is 
allowed to depend on the Markov chain. More precisely, let (A;) be a Markov chain with state 


space 0, 1,..., K — 1. Suppose that this chain is homogenous, stationary, irreducible and aperiodic, 
and let p;; = P(A; = j|A:-1 =i], for i, j =0,1,..., K — 1, be its transition probabilities. The 
model is given by 
q P 
Et = Ott, of = w(A;) + Yo aie? ; a X hens (4.7) 
i=l i=l 
with 
K 


o(A) =} o lia;=i-1} 0< o <o <... < Ok, (4.8) 


i=1 
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where (7,) is an iid (0,1) process with finite fourth-order moment, the sequence (n,) being 
independent of the sequence (A;). Tedious computations show that (e,) is a weak 
GARCH(max{p, gq} + K — 1, p+K-— 1) process of the form 


K-1 max{ p,q} p+K-1 
[ [0-b (:- DD wtint')d=os I+ >a BiL' | us, (4.9) 


k=1 i=1 i=l 
where Aj,...,Ax—1 are the eigenvalues different from | of the matrix P = (p;;). The 6; generally 


do not have simple expressions as functions of the initial parameters, but can be numerically 
obtained from the first autocorrelations of the process (e?) (Exercise 4.7). 


Example 4.4 (Stochastic volatility model) An example of stochastic volatility model is given by 


&=am, of =c+do?,+(at+bo7,)u, c,d, b>0, a>0. (4.10) 
where (7;) and (v;) are iid (0, 1) sequences, with 7; independent of the v,_;, j = 0. Note that 
the GARCH(1, 1) process is obtained by taking v; = Zo —1 and a = 0. Under the assump- 
tion d? +b? < 1, it can be shown (Exercise 4.5) that the autocovariance structure of (e?) is 
characterized by 

Cov(e?, e? p) = dCov(e?, e? p41), Yh>1. 
It follows that (€;) is a weak GARCH(1, 1) process with 
Cade, = c + ur + pur, (4.11) 


where (u+) is a weak white noise and £ can be explicitly computed. 


Example 4.5 (Contemporaneous aggregation of GARCH processes) It is standard in finance 
to consider linear combinations of several series (for instance, to define portfolios). If these series 
are GARCH processes, is their linear combination also a GARCH process? To simplify the pre- 
sentation, consider the sum of two GARCH(1, 1) processes, defined as the second-order stationary 
and nonanticipative solutions of 

Eit = Oinin OR = oi Haiei + Bio?) 4, OF >0,0;,8;=0, Ci) tid (0,1), i=1,2, 


and suppose that the sequences (nız) and (72;) are independent. Let €, = €17 + €2;. It is easy to see 


that (€+) is a white noise. We have, for h > 0, Cov(e?, Fn) = 0 fori 4 j, because the processes 


(€1,) and (€2,) are independent. Moreover, for h > 0, 
Cov(€1;€2;, Elen) = E (erren? n) = E (mir) E(o11€21€} p) =0 
because 71; is independent of the other variables. It follows that, for h > 0, 
Cov(e?, e2 p) = Cov (ef, €f _,) + Cov(e},, €Z ,_;). (4.12) 
By formula (2.61), we deduce that 


yalh) = Cove? en) = yao + Bi)! + ygn + Bay's hzl (413) 
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If f is a function defined on the integers, denote by Lf the function such that Lf (h) = f(h — 1), 
h >0. We have (1 — BL)p" = 0 for h > 0. Relation (4.13) shows that 


{1 — (œi + LHI — (a2 + Ba)L}ya2(h) =0, h>2. 


It follows that (e?) is a weak GARCH(2, 2) process of the form 


{1 — (ay + BDLHI — (a2 + Bo) Lhe? = w + u, + ruri + O22, 


where (ur) is a noise. Since Ee? = Bei. + Ee, we obtain w = {1 — (a1 + Bi )}o2 + {1 — (@24 
B2)}@,. Note that the orders obtained for the ARMA representation of é are not necessarily the 
minimum ones. Indeed, if a; + 8} = a2 + fo, then y.2(h) = {y2(1) + y2(1)}(1 + By)" ",h>1, 
from (4.13). Therefore, {1 — (a; + B,)L}y.2(h) = 0 if h> 1. Thus (€?) is a weak GARCH(1, 1) 
process. This example can be generalized to higher-order GARCH models (Exercise 4.9). 


Example 4.6 (8-ARCH process) Consider the conditionally heteroscedastic AR(1) model 
defined by 
X, = OX + (c+a|Xi1?)'?m, lol <1, c>0, a >0, 


where (n+) is an iid (0, 1) symmetrically distributed sequence. A difference between this model, 
called B-ARCH, and the standard ARCH is that the conditional variance of X, is specified as a 
function of X;—1, not as a function of the noise. 
Suppose £ = 1 and let 
ye 

e= (c+ aX?) Pn. 

We have 5 
G=c+a Yoo ei + Ur, 


i=l 


where u; = e? -—E (€7|€,_1). By expanding the squared term we obtain the representation 


[1 — (¢? +.a)Lle? = cl — p°) + vi — °v, 


where v =a}; jziiżj iti- ericrj+ur. Note that the process (v; — ¢°vi-1) is MA(1). 
Consequently, (€7) is an ARMA(I, 1) process. Finally, the process (X,) admits a weak AR(1)- 
GARCH(1, 1) representation. 


4.3 Aggregation of Strong GARCH Processes 
in the Weak GARCH Class 


We have seen that the class of semi-strong GARCH models (defined in terms of conditional 
moments) is not large enough to include all processes obtained by temporal aggregation of strong 
GARCH. In this section we show that the weak GARCH class of models is stable by temporal 
aggregation. Before dealing with the general case, we consider the GARCH(1, 1) model, for which 
the solution is more explicit. 


Theorem 4.1 (Aggregation of the GARCH(1, 1) process) Let (€;) be a weak GARCH(1, 1) 
process. Then, for any integer m > 1, the process (€mt) is also a weak GARCH(1, 1) process. The 
parameters of the ARMA representations 


2 2 2 2 
ef — aef =ct+y,—by-1 and En a(m)Ema—1) = Cn) F Van),t — Dim) Von),t-1 
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are related by the relations 


l—a™ 


m 
Am) =a, Cm =C 


’ 


l-a 


bin) a™'b(1 — a”) 


+o, A= a?) (l ha) + (a — bP — a?™-D) 


Proof. First note that, (e?) being stationary by assumption, and (v;) being its linear innovation, 
a is strictly less than 1 in absolute value. Now, if (€+) is a white noise, (€,,;) is also a white noise. 
By successive substitutions we obtain 


eseltaest,..ta" jtare , +h (4.14) 


where v; = v + (a a b)[vi-1 +av-2 +... + a” Wy m1] = a”! bim: Because (ve) is a noise, 
we have 
Cov(v;, Vr—mk) =0, Vk>1. 


Hence, (Umr)rez is an MA(1) process, from which it follows that (€;) is an ARMA(1, 1) process. 
The constant term and the AR coefficient of this representation appear directly in (4.14), whereas 
the MA coefficient is obtained as the solution, of absolute value less than 1, of 


Dany COV (vr, Vim) a"-'b 
1+ Bony 7 Var(v;) ~ 14+ (a—b2(+a2+...+ a2) + arp? 


which, after simplification, gives the claimed formula. 


Note, in particular, that the aggregate of an ARCH(1) process is another ARCH(1): b = 0 => 
bin) = 0. 

It is also worth noting that a” tends to 0 when m tends to infinity, thus aj) and bon) also tend 
to 0. In other words, the conditional heteroscedasticity tends to vanish by temporal aggregation of 
GARCH processes. This conforms to the empirical observation that low-frequency series (weekly, 
monthly) display less ARCH effect than daily series, for instance. 

The previous result can be straightforwardly extended to the GARCH(1, p) case. Denote by 
[x] the integer part of x. 


Theorem 4.2 (Aggregation of the GARCH(1, p) process) Let (€;) be a weak GARCH(1, p) 
process. Then, for any integer m > 1, the process (€mt) is a weak GARCH (1, 1+ E) process. 


Proof. In the proof of Theorem 4.1, equation (4.14) remains valid subject to a modification 
of the definition of v. Introduce the lag polynomial Q(L) = 1—b,L—---—b,L?. We have 
v: = Q(L)[1 +aL + ...+a”7!L”-!]v,. Thus, because (v;) is a noise, 


p~l 
Cov(v;, Ur—mk) = 0, Yk>1+ . 
m 


Hence, (v,;) is an MA(1 + [= ]) process, and the conclusion follows. 


It can be seen from (4.14) that the constant term and the AR coefficient of the ARMA represen- 
tation of (€m;) are the same as in Theorem 4.1. The coefficients of the MA part can be determined 


through the first 2 + [2 —] autocovariances of the process (Vmr). 
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Note that temporal aggregation always entails a reduction of the MA order in the ARMA 
representation (except when p = 1, for which it remains equal to 1) — all the more so as m 
increases. Let us now turn to the general case. 


Theorem 4.3 (Aggregation of the GARCH(r, p) process) Let (€,) be a weak GARCH(r, p) 
process. Then for any integer m > 1, the process (€mt) is a weak GARCH (r, r+ [2 —|) process. 


m 


Proof. Denote by à; (1 < i < r) the inverses of the complex roots of the AR polynomial of the 
ARMA representation for (€7). Write model (4.3) in the form 


] [a -uae - #) = Ob), 


i=l 


where u = Ee? and Q(L)=1—)-?_,b;L'. Applying the operator []}_,(+A;L+---+ 
re) to this equation we get 


[0 -are™ye? —w) =] [G+ abt ATD). 
i=1 i=1 


2 
mt 


Consider now the process (Zr) defined by Z\”” =e u. We have the model 


r r 
[0 -arez = [JO + abt FAP TL” OL) in = v, 
i=l 


i=1 


with the convention that Lv; = Vmr—1. Observe that v; = f (Unt, Vmt—1s +++ Vmt—r(m—1)—p)» This 
suffices to show that the process (v;) is a moving average. The largest index k for which v; 
and v,- have a common term v; is such that r(m — 1) + p — m < mk < r(m — 1) + p. Thus 
k = r + [Z=], which gives the order of the moving average part, and subsequently the orders of 


m 
the ARMA representation for €7,,. 


This proof suggests the following scheme for deriving the exact form of the ARMA 
representation: 


(i) The AR coefficients are deduced from the roots of the AR polynomial; but in the previous 
proof we saw that these roots, for the aggregate process, are the mth powers of the roots 
of the initial AR polynomial. 


(ii) The constant term immediately follows from the AR coefficients and the expectation of 
the process: E(€?,) = E(€?). 


Gii) The derivation of the MA part is more tedious and requires computing the first r + 


[=] autocovariances of the process (€2,,); these autocovariances follow from the ARMA 
2 


representation for ef. 


An alternative method involves multiplying the ARMA representation of (e?), written in 
polynomial form, by a well-chosen lag polynomial so as to directly obtain the AR part of the 
ARMA representation of (e?"). Let P(L) = [];_,;(1 —A;L) denote the AR polynomial of the 
representation of (e2). The AR polynomial of the representation of (e2) is given by 


P) | [a +aik+...ar te!) =[]a ape”). 
i=1 


i=l 
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Example 4.7 (Computation of a weak GARCH representation) Consider the GARCH(2, 1) 
process defined by 
Et = Otr, (n,) id (0, 1) 
| o7 = 1 + 0.le? | +0.1e? , +0.207 ,, 


and let us derive the weak GARCH representation of the process (€2;). 
The ARMA representation of (e?) is written as 


e? = 1 + 0.3e2; + 0.1e7. +v — 0.2v1, 


that is, 
(d -— 0.5L) + 0.2L)e? = 1 + (1 — 0.2L)v;. 


Multiplying this equation by (1 + 0.5L)(1 — 0.2L), we obtain 


(1 = 0.25L°)(1 = 0.04L°)e? = 1.2 + (1-051) = 0.2L)? w. 


Set v; = (1 + 0.5L)(1 — 0.2L)? v. The process (vz) is MA(1), var = ur — Ou;—1, where 0 = 0.156 
is the solution, with absolute value less than 1, of the equation 


0 —Cov(v;, 2) 0.16 — 0.02 x 0.1 


—— = > — = 0.1525. 
1+0? Var(v;) 1+ 0.12 + 0.162 + 0.022 


The weak GARCH(2, 1) representation of the process (€2;) is then 
3, = 1.2 + 0.296341) — 0.013 ,,_9) + ur — 0.156u;—1. 


Observe that the sign of the coefficient of Su-2) is not compatible with a strong or semi-strong 
GARCH. 


4.4 Bibliographical Notes 


The main results concerning the temporal aggregation of GARCH models were established by 
Drost and Nijman (1993). It should be noted that our definition of weak GARCH models is not 
exactly the same as theirs: in Definition 4.1, the noise (v,) is not the strong innovation of (67), but 
only the linear one. Drost and Werker (1996) introduced the notion of the continuous-time GARCH 
process and deduced the corresponding weak GARCH models at the different frequencies (see also 
Drost, Nijman and Werker, 1998). The problem of the contemporaneous aggregation of independent 
GARCH processes was studied by Nijman and Sentana (1996). 

Model (4.5) belongs to the class of quadratic ARCH models introduced by Sentana 
(1995). GARCH models observed with measurement errors are dealt with by Harvey, Ruiz and 
Sentana (1992), Gouriéroux, Monfort and Renault (1993) and King, Sentana and Wadhwani (1994). 
Example 4.4 belongs to the class of stochastic autoregressive volatility (SARV) models introduced 
by Andersen (1994). The 6-ARCH model was introduced by Diebolt and Guégan (1991). 

Markoy-switching ARMA(p,q) models were introduced by Hamilton (1989). Pagan and 
Schwert (1990) considered a variant of such models for modeling the conditional variance of 
financial series. Model (4.7) was studied by Cai (1994) and Dueker (1997); see also Hamilton and 
Susmel (1994). The probabilistic properties of the Markov-switching GARCH models were studied 
by Francq, Roussignol and Zakoian (2001). The existence of ARMA representations for powers 
of E (as in (4.9)) was established by Francq and Zakoïan (2005), and econometric applications of 
this property were studied by Francq and Zakoian (2008). 

The examples of weak GARCH models discussed in this chapter were analyzed by Francq and 
Zakoian (2000), where a two-step least-squares method of estimation of weak ARMA-GARCH 
was also proposed. 
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4.5 Exercises 


4.1 


4.2 


4.3 


4.4 


4.5 


4.6 


4.7 


(Aggregate strong ARCH(1) process) 
Show that the process (€mt) obtained by temporal aggregation of a strong ARCH(1) process 
(€+) is a semi-strong ARCH. 


(Aggregate weak GARCH(1, 2) process) 
State the equivalent of Theorem 4.1 for a GARCH(1, 2) process. 


(GARCH with measurement error) 
Show that in Example 4.1 we have Cov(e?, ei) = Cov(e?, e?) for all h > 0. Use this result 
to deduce the weak GARCH representation of (€;). 


(Quadratic ARCH) 
Verify that the process (v;) defined in (4.6) is an MA(max{p, q}) process. 


(Stochastic volatility model) 
In model (4.10), the volatility equation can be written as 


of = Alv) + B(u)o7_,, 


where A(v;) = c +av;, B(v;) = d + bv. Suppose that d? + b? < 1. 


1. Show that 
Cov(e;, €7_,) = Cov(B(u;)o,_,, A(Y—n) + Brnon) Yh>0. 


2. Express ay as a function of a and of the process (v;) and deduce that, for all h > 0, 


1 


Cov(e?, 7») = [E{B(u:)}]"[2Cov{A(v;), B(w )} Eo; + VarfA(v:)} 
+ Var{ B(v;)}(Eo;)? + Var(o7) E{B(v:)"}I. 
3. Using the second-order stationarity of (07), compute E (67) and Var(o7) and determine 
Cov(e?, 4) for h >Q. 
4. Conclude that (4.11) holds and explain how to obtain £. 


(Independent-switching model) 

Consider model (4.7)—(4.8) in the particular case where the chain (A) is an iid process (that 
is, when p(i, j) does not depend on i, for any (i, j)). Give a more explicit form for the weak 
GARCH representation (4.9). 


(A two-regime Markov-switching model without ARCH coefficients) 

In model (4.7)—(4.8), suppose that p = q = O (that is, of = w(A;)) and take for (A;) a 
two-state Markov chain with 0 < poi < 1, 0 < pio < 1. Let x(i) = P(A; = i). Denote by 
pi, j) the k-step transition probabilities, that is, the entries of P*. Set à = p(1, 1) + 
p(2, 2) — 1. 


1. Compute Ee?. 


2. Show that, for i, j = 1, 2, 


PG, D-a) = [0 — rG) Wea r) luz]. (4.15) 
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4.8 


4.9 
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3. Deduce that, for k > 0, 


Cov(e?, e p) = Ao — oy (A)r (2). 


4. Compute Var(€?). 
5. Deduce that e has an ARMA(1, 1) representation and determine the AR coefficient. 
6. Simplify this representation in the case po; + Pio = 1. 
7. Determine numerically the ARMA (1, 1) representation for the model: 
Et = ON, o? = 0.21 I,,21 +46.0 la,=2, pd, 1) = 0.98, p(2, 1) = 0.38, 
where n, ~ N(0, 1). 
(Bilinear model) 


Let €; = ntm—1, where (n+) is a strong white noise with unit variance such that E(n’) < œ. 


1. Show that the process (€+) is a weak GARCH. 
2. Show that the process (e? — 1) is a weak ARMA-GARCH. 


(Contemporaneous aggregation) 
Using the method of the proof of Theorem 4.3, generalize Example 4.5 by considering the 
contemporaneous aggregation of independent strong GARCH processes of any orders. 


Part Il 


Statistical Inference 


Identification 


In this chapter, we consider the problem of selecting an appropriate GARCH or ARMA-GARCH 
model for given observations X;,..., Xn» of a centered stationary process. A large part of the 
theory of finance rests on the assumption that prices follow a random walk. The price variation 
process, X = (X;), should thus constitute a martingale difference sequence, and should coincide 
with its innovation process, € = (€;). The first question addressed in this chapter, in Section 5.1, 
will be the test of this property, at least a consequence of it: absence of correlation. The problem 
is far from trivial because standard tests for noncorrelation are actually valid under an indepen- 
dence assumption. Such an assumption is too strong for GARCH processes which are dependent 
though uncorrelated. 

If significant sample autocorrelations are detected in the price variations — in other words, if 
the random walk assumption cannot be sustained — the practitioner will try to fit an ARMA(P, Q) 
model to the data before using a GARCH(p, q) model for the residuals. Identification of the 
orders (P, Q) will be treated in Section 5.2, identification of the orders (p,q) in Section 5.3. 
Tests of the ARCH effect (and, more generally, Lagrange multiplier tests) will be considered in 
Section 5.4. 


5.1 Autocorrelation Check for White Noise 


Consider the GARCH(p, q) model 


Et = Ott 
q P (5.1) 
2 2 ; 
o? =w + Y aici or X bie: 
i=l j=l 
with (7;) a sequence of iid centered variables with unit variance, œw > 0, a; > 0 (i =1,...,q), 
pj => 0 (j =1,..., p). We saw in Section 2.2 that, whatever the orders p and q, the nonantici- 


pative second-order stationary solution of (5.1) is a white noise, that is, a centered process whose 
theoretical autocorrelations p(h) = Fe€;€;4;,/ Ee? satisfy p(h) = 0 for all h 40. 
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Given observations €14, ..., €n, the theoretical autocorrelations of a centered process (€,) are 
generally estimated by the sample autocorrelations (SACRs) 


n—h 
r PA E _ 
ph) = S, Ph) = Ph) =n! $ eren, (5.2) 
70) = 
for h =0,1,...,n — 1. According to Theorem 1.1, if (€,) is an iid sequence of centered random 


variables with finite variance then 
L 
Jnplh) + N0, 1), 


for all h £0. For a strong white noise, the SACRs thus lie between the confidence bounds 
+1.96/,/n with a probability of approximately 95% when n is large. In standard software, these 
bounds at the 5% level are generally displayed with dotted lines, as in Figure 5.2. These signifi- 
cance bands are not valid for a weak white noise, in particular for a GARCH process (Exercises 
5.3 and 5.4). Valid asymptotic bands are derived in the next section. 


5.1.1 Behavior of the Sample Autocorrelations 
of a GARCH Process 


Let 6m = (C1), ..., 6(m))’ denote the vector of the first m SACRs, based on n observations of 
the GARCH(p, q) process defined by (5.1). Let Pn = (P(1),..., ?(m))’ denote a vector of sample 
autocovariances (SACVs). 


Theorem 5.1 (Asymptotic distributions of the SACVs and SACRs) Jf (€;) is the nonanticipa- 
tive and stationary solution of the GARCH(p, q) model (5.1) and if Eef < œ, then, when n > œ, 


Vapa S N (0, Ep) and Jipm > N (0, Dp, = {Ee2}-Z~,) , 


where 
22 2 2 
Eee, Bere-16-2 ...  B€f€+-1€+-m 
2 2,2 
oy = Pe e&—1€1-2 Eef E2 
Ym T 
2 22 
Ee; Et—1Et—-m Aae Ee; Etm 


is nonsingular. If the law of n; is symmetric then Xy,, is diagonal. 


Note that Xo,, = Im when (€;) is a strong white noise, in accordance with Theorem 1.1. 


Proof. Let J, = (7(1),...,7(m))’, where 7(h) = n7! ye €r€r—n- Since, for m fixed, 


1/2 


2 
; g 1 m h 5 1 m h 
[Van — Vn?n lh = Ti DE (è se) < dD lelg > 0 


h=1 t=1 h=1 t=1 


as n — œ, the asymptotic distribution of ./n Pj coincides with that of ./nf. Let h and k belong 
to {1,..., m}. By stationarity, 
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Cov {Vn (h), Vn? (k)} 


1 n 
Ai >. Cov (€t€t-h; EsEs—k) 


t,s=1 


n=1 


1 
= J. (= ECov (eren, €r40€r4e-K) 


n 
ł=-n+1 


2 
Fe; &—n€1—k 


because 2 
Eeré;nére if €=0 
Cov (€t€t—h, €r4-0€r+e—k) = ' ise. 
(Er€t—h, Et+tEr+e—k) 0 otherwise. 


From this, we deduce the expression for Xy,,. From the Cramér—Wold theorem,! the asymptotic 
normality of ./77 will follow from showing that for all nonzero A = (Ay,..., Am)’ € R”, 


VAN in S NO, Epp À). (5.3) 


Let F, denote the o-field generated by {e€,,u < t}. We obtain (5.3) by applying a central limit 
theorem (CLT) to the sequence (e; Y Ài€t—i, Ft)r, which is a stationary, ergodic and square 
integrable martingale difference (see Corollary A.1). 

The asymptotic behavior of m immediately follows from that of 7,,(as in Exercise 5.3). 

Reasoning by contradiction, suppose that Xy, is singular. Then, because this matrix is the 
covariance of the vector (€;€;-1,..-, €€;—m)’, there exists an exact linear combination of the 
components of (€;€;-1, ..., €r€:-m)’ that is equal to zero. For some ip > 1, we then have €;€;—j, = 
eieigt 1 Ai€rer—is that is, Eio Mn, 40) = WyLig 41 AEri Uy, 40). Hence, 


m 
Ee? ig Mn) = YY MECE igri lo) 
i=iọ+1 
m 
= J. MEle iei) #0) =0 


i=igtl 


which is absurd. It follows that £p, is nonsingular. 
When the law of 7; is symmetric, the diagonal form of Xp, is a consequence of property 
(7.24) in Chapter 7. See Exercises 5.5 and 5.6 for the GARCH(1, 1) case. 


A consistent estimator p, of Ly, is obtained by replacing the generic term of £p, by 


n 
-1 ` i 2 
n Er Et—i €t- j; 
t=1 


with, by convention, €s=o for s < 1. Clearly, Ês, := ~~2(0)Zy,, is a consistent estimator of X5, 
and is almost surely invertible for n large enough. This can be used to construct asymptotic 
significance bands for the SACRs of a GARCH process. 


' For any sequence (Z,,) of random vectors of size d, Zn af Z if and only if, for all A € IR“, we have 
1 £ i 
NZy > NZ. 
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Practical Implementation 


The following R c 


ode allows us to draw a given number of autocorrelations 6(i) and the 


significance bands +1.96,/ Ê;, (i, i)/n. 


# autocorrelation function 


gamma<-function(x,h) { n<-length(x); h<-abs(h) ;x<-x-mean (x) 
+ gamma<-sum(x[1: (n-h)]*x[(h+1):n])/n } 


rho<-function(x,h) rho<-gamma(x,h) /gamma (x, 0) 


# acf function 


with significance bands of a strong white noise 


nl.acf<-function(x,main=NULL,method='NP’ ) { 


+ n<-length(x) ; 
acf.val<-sapply 


nlag<-as.integer(min(10*log10(n),n-1)) + 
(c(1:nlag),function(h) rho(x,h)) + x2<-x*2 + 


var<-1+(sapply(c(1:nlag),function(h) gamma(x2,h)))/gamma(x,0)*2 + 


band<-sqrt(var/n) + 


minval<-1.2*min 
maxval<-1.2*max 
acf (x,xlab='’Lag 
lines (c(1:nlag) 
lines (c(1:nlag) 


(acf.val,-1.96*band,-1.96/sqrt(n)) + 
(acf.val,1.96*band,1.96/sqrt(n)) + 

', ylab=’SACR’,ylim=c(minval,maxval),main=main) + 
,-1.96*band, 1ty=1,col=’red’) + 
,1.96*band,lty=1,col=‘red’) } 


In Figure 5.1 we have plotted the SACRs and their significance bands for daily series of 
exchange rates of the dollar, pound, yen and Swiss franc against the euro, for the period from 


January 4, 1999 to J 


anuary 22, 2009. It can be seen that the SACRs are often outside the standard 


significance bands 4 
these series. On the 
solid lines, which is 
white noises. 


+ 1.96/./n, which leads us to reject the strong white noise assumption for all 
other hand, most of the SACRs are inside the significance bands shown as 
in accordance with the hypothesis that the series are realizations of semi-strong 
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Figure 5.1 SACR 


of exchange rates against the euro, standard significance bands for the SACRs 


of a strong white noise (dotted lines) and significance bands for the SACRs of a semi-strong white 


noise (solid lines). 
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5.1.2 Portmanteau Tests 


The standard portmanteau test for checking that the data is a realization of a strong white noise is 
that of Ljung and Box (1978). It involves computing the statistic 


ORP =n +2) >> P/a i) 


i=l 


and rejecting the strong white noise hypothesis if QŁ? is greater than the (1 — œ)-quantile of 
a x22 

Portmanteau tests are constructed for checking noncorrelation, but the asymptotic distribution 
of the statistics is no longer x when the series departs from the strong white noise assumption. 
For instance, these tests are not robust to conditional heteroscedasticity. In the GARCH framework, 
we may wish to simultaneously test the nullity of the first m autocorrelations using more robust 
portmanteau statistics. 


Theorem 5.2 (Corrected portmanteau test in the presence of ARCH) Under the assumptions 
of Theorem 5.1, the portmanteau statistic 


al Ala 
Qm = NPm Lan, Pm 
has an asymptotic x2, distribution. 


Proof. Ii suffices to use Theorem 5.1 and the following result: if X, £ NO, £), with © non- 
singular, and if 4, —> © in probability, then X! 7! X, is x2. 


A portmanteau test of asymptotic level œ based on the first m SACRs involves rejecting the 
hypothesis that the data are generated by a GARCH process if Qm is greater than the (1 — @)- 
quantile of a x2. 


5.1.3 Sample Partial Autocorrelations of a GARCH 


Denote by Fm (fm) the vector of the m first partial autocorrelations (sample partial autocorrelations 
(SPACs)) of the process (€,). By Theorem B.3, we know that for a weak white noise, the SACRs 
and SPACs have the same asymptotic distribution. This applies in particular to a GARCH process. 
Consequently, under the hypothesis of GARCH white noise with a finite fourth-order moment, 
consistent estimators of £p, are 


Tm 


where J, is the matrix obtained by replacing px(1), ..., ox(m) by fx(1), ..., Ox(m) in the 
Jacobian matrix Jm of the mapping Pm +> rm, and 2i4,, is the consistent estimator of X,,, defined 
after Theorem 5.1. 

Although it is not current practice, one can test the simultaneous nullity of several theoretical 
partial autocorrelations using portmanteau tests based on the statistics 


m m m m 


csv 
Or? =n! A, and OQ" =n? (£2) Pm 


> The asymptotic distribution of QŁ? is x2. The Box and Pierce (1970) statistic Q8? := n 77_, ° (i) has 


m m 
the same asymptotic distribution, but the Q4? statistic is believed to perform better for finite samples. 
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Figure 5.2 SACRSs of a simulation of a strong white noise (left) and of the GARCH(1, 1) white 
noise (5.4) (right). Approximately 95% of the SACRs of a strong white noise should lie inside the 
thin dotted lines +1.96/,/n. Approximately 95% of the SACRs of a GARCH(1, 1) white noise 
should lie inside the thick dotted lines. 


with, for instance, i = 2. From Theorem B.3, under the strong white noise assumption, the statistics 
BP QBP and QŁB have the same x7 asymptotic distribution. Under the hypothesis of a pure 


m 7 m m 
GARCH process, the statistics Q7, and Qm also have the same x asymptotic distribution. 


r 
m 


5.1.4 Numerical Illustrations 
Standard Significance Bounds for the SACRs are not Valid 


The right-hand graph of Figure 5.2 displays the sample correlogram of a simulation of size n = 
5000 of the GARCH(1, 1) white noise 
Et = Ott 
| of = 1 +0.3e? | + 0.5507}, (5.4) 


where (n+) is a sequence of iid M(0, 1) variables. It is seen that the SACRs of order 2 and 4 are 
sharply outside the 95% significance bands computed under the strong white noise assumption. An 
inexperienced practitioner could be tempted to reject the hypothesis of white noise, in favor of a 
more complicated ARMA model whose residual autocorrelations would lie between the significance 
bounds +1.96/,/n. To avoid this type of specification error, one has to be conscious that the bounds 
+1.96/./n are not valid for the SACRs of a GARCH white noise. In our simulation, it is possible 
to compute exact asymptotic bounds at the 95% level (Exercise 5.4). In the right-hand graph of 
Figure 5.2, these bounds are drawn in thick dotted lines. All the SACRs are now inside, or very 
slightly outside, those bounds. If we had been given the data, with no prior information, this 
graph would have given us no grounds on which to reject the simple hypothesis that the data is a 
realization of a GARCH white noise. 


Estimating the Significance Bounds of the SACRs of a GARCH 


Of course, in real situations the significance bounds depend on unknown parameters, and thus 
cannot be easily obtained. It is, however, possible to estimate them in a consistent way, as described 
in Section 5.1.1. For a simulation of model (5.4) of size n = 5000, Figure 5.3 shows as thin dotted 
lines the estimation thus obtained of the significance bounds at the 5% level. The estimated bounds 
are fairly close to the exact asymptotic bounds. 


The SPACs and Their Significance Bounds 


Figure 5.4 shows the SPACs of the simulation (5.4) and the estimated significance bounds of the 
F(A), at the 5% level (based on =). By comparing Figures 5.3 and 5.4, it can be seen that the 
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Figure 5.3 Sample autocorrelations of a simulation of size n = 5000 of the GARCH(1, 1) white 
noise (5.4). Approximately 95% of the SACRs of a GARCH(1, 1) white noise should lie inside 
the thin dotted lines. The exact asymptotic bounds are shown as thick dotted lines. 
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Figure 5.4 Sample partial autocorrelations of a simulation of size n = 5000 of the GARCH(1, 1) 
white noise (5.4). Approximately 95% of the SPACs of a GARCH(1, 1) white noise should lie 
inside the thin dotted lines. The exact asymptotic bounds are shown as thick dotted lines. 


SACRs and SPACs of the GARCH simulation look much alike. This is not surprising in view of 
Theorem B.4. 
Portmanteau Tests of Strong White Noise and of Pure GARCH 


Table 5.1 displays p-values of white noise tests based on Q,, and the usual Ljung—Box statistics, 
for the simulation of (5.4). Apart from the test with m = 4, the Q,, tests do not reject, at the 
5% level, the hypothesis that the data comes from a GARCH process. On the other hand, the 
Ljung—Box tests clearly reject the strong white noise assumption. 

Portmanteau Tests Based on Partial Autocorrelations 

Table 5.2 is similar to Table 5.1, but presents portmanteau tests based on the SPACs. As expected, 
the results are very close to those obtained for the SACRs. 

An Example Showing that Portmanteau Tests Based on the SPACs Can Be More Powerful 
than those Based on the SACRs 


Consider a simulation of size n = 100 of the strong MA(2) model 


Xi =m +0.56n-1 —0.44n-2, M iid MO, 1). (5.5) 
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Table 5.1 Portmanteau tests on a simulation of size n = 5000 of the GARCH(1, 1) white noise 
(5.4). 


Tests based on Q,,, for the hypothesis of GARCH white noise 


m 1 2 3 4 5 6 
p(m) 0.00 —0.06 —0.03 0.05 —0.02 0.00 
pcm) 0.025 0.028 0.024 0.024 0.021 0.026 
Oy, 0.00 4.20 5.49 10.19 10.90 10.94 
Pie > On) 0.9637 0.1227 0.1391 0.0374 0.0533 0.0902 
m 7 8 9 10 11 12 
p(m) 0.02 —0.01 —0.02 —0.02 0.00 0.01 
pcm) 0.019 0.023 0.019 0.016 0.017 0.015 
Dy, (2:19 12.27 13.16 14.61 14.67 15.20 
Pe > Qm) 0.0967 0.1397 0.1555 0.1469 0.1979 0.2306 


Usual tests, for the strong white noise hypothesis 


m 1 2 3 4 5 6 


p(m) 0.00 —0.06 —0.03 0.05 —0.02 0.00 
Spm) 0.014 0.014 0.014 0.014 0.014 0.014 
LB 0.01 16.78 20.59 34.18 35.74 35.86 
P(x2 > QL) 0.9365 0.0002 0.0001 0.0000 0.0000 0.0000 
m 7 8 9 10 11 12 
p(m) 0.02 —0.01 —0.02 —0.02 0.00 0.01 
Shum) 0.014 0.014 0.014 0.014 0.014 0.014 
LB 38.05 38.44 39.97 41.82 41.91 42.51 
P(x > 017) 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 


By comparing the top two and bottom two parts of Table 5.3, we note that the hypotheses of 
strong white noise and pure GARCH are better rejected when the SPACs, rather than the SACRs, 
are used. This follows from the fact that, for this MA(2), only two theoretical autocorrelations 
are not equal to 0, whereas many theoretical partial autocorrelations are far from 0. For the 
same reason, the results would have been inverted if, for instance, an AR(1) alternative had 
been considered. 


5.2 Identifying the ARMA Orders of an ARMA-GARCH 


Assume that the tools developed in Section 5.1 lead to rejection of the hypothesis that the data is a 
realization of a pure GARCH process. It is then sensible to look for an ARMA(P, Q) model with 
GARCH innovations. The problem is then to choose (or identify) plausible orders for the model 


P Q 
X= Xo ai Xi = €& — X bierni (5.6) 
i=1 


i=1 


under standard assumptions (the AR and MA polynomials having no common root and having 
roots outside the unit disk, with apbg 4 0, Ee} < œ), where (€+) is aGARCH white noise of the 
form (5.1). 
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Table 5.2 As Table 5.1, for tests based on partial autocorrelations instead of autocorrelations. 


GARCH white noise tests based on Q” 


m 


m 1 2 3 4 5 6 


f(m) 0.00 —0.06 —0.03 0.05 —0.02 0.00 
Gr(m) 0.025 0.028 0.024 0.024 0.021 0.026 
a 0.00 4.20 5.49 9.64 10.65 10.650 
P(x2 > Q7) 0.9637 0.1227 0.1393 0.0470 0.0587 0.0998 
m 7 8 9 10 11 12 
f(m) 0.02 —0.01 —0.01 —0.02 0.00 0.01 
Sr(m) 0.019 0.023 0.019 0.016 0.017 0.015 
A 11.92 12.24 12.77 14.24 14.24 14.67 
P(x2 > Q7) 0.1032 0.1407 0.1735 0.1623 0.2200 0.2599 
Strong white noise tests based on Q?."8 
m 1 2 3 4 5 6 
f(m) 0.02 —0.01 —0.01 —0.02 0.00 0.01 
GF(m) 0.014 0.014 0.014 0.014 0.014 0.014 
Bee 0.01 16.77 20.56 32.55 34.76 34.76 
P(x2 > O18) 0.9366 0.0002 0.0001 0.0000 0.0000 0.0000 
m 7 8 9 10 11 12 
f(m) 0.02 —0.01 —0.01 —0.02 0.00 0.01 
Gr(m) 0.014 0.014 0.014 0.014 0.014 0.014 
QB 37.12 37.94 38.84 40.71 40.71 41.20 
P(x2 > QB) 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 


5.2.1 Sample Autocorrelations of an ARMA-GARCH 


Recall that an MA(Q) satisfies py(h) = 0 for all h > Q, whereas an AR(P) satisfies ry(h) = 
0 for all h > P. The SACRs and SPACs thus play an important role in identifying the orders 
P and Q. 


Invalidity of the Standard Bartlett Formula and Modified Formula 


The validity of the usual Bartlett formula rests on assumptions including the strong white noise 
hypothesis (Theorem 1.1) which are obviously incompatible with GARCH errors. We shall see 
that this formula leads to underestimation of the variances of the SACRs and SPACs, and thus to 
erroneous ARMA orders. We shall only consider the SACRs because Theorem B.2 shows that the 
asymptotic behavior of the SPACs easily follows from that of the SACRs. 

We assume throughout that the law of 7; is symmetric. By Theorem B.5, the asymptotic behav- 
ior of the SACRs is determined by the generalized Bartlett formula (B.15). This formula involves 
the theoretical autocorrelations of (X;) and (e?), as well as the ratio ke — 1 = y,2(0)/ y2(0). More 
precisely, using Remark 1 of Theorem 7.2.2 in Brockwell and Davis (1991), the generalized Bartlett 
formula is written as 


* 


lim nCov {6x (i), Ax (J)} = vij + UF, 
n—->Cco 
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Table 5.3 White noise portmanteau tests on a simulation of size n = 100 of the MA(2) 
model (5.5). 


Tests of GARCH white noise based on autocorrelations 


m 1 2 3 4 5 6 
Om 1.6090 4.5728 5.5495 6.2271 6.2456 6.4654 
P(x2 > Om) 0.2046 0.1016 0.1357 0.1828 0.2830 0.3731 


Tests of GARCH white noise based on partial autocorrelations 


m 1 2 3 4 5 6 
O’, 1.6090 5.8059 9.8926 16.7212 21.5870 25.3162 
P(x2 > Q!) 0.2046 0.0549 0.0195 0.0022 0.0006 0.0003 


Tests of strong white noise based on autocorrelations 


m 1 2 3 4 5 6 
QLB 3.4039 8.4085 9.8197 10.6023 10.6241 10.8905 
P(x2 > QL) 0.0650 0.0149 0.0202 0.0314 0.0594 0.0918 


Tests of strong white noise based on partial autocorrelations 


m 1 2 3 4 5 6 
Cai 3.3038 10.1126 15.7276 23.1513 28.4720 32.6397 
P(x2 > Q7PP) 0.0691 0.0064 0.0013 0.0001 0.0000 0.0000 
where 
CO CO 
vp =J wOw O, ve = ke — DY paOwilwjO, (5.7) 
é=1 é=1 
and 


wi(l) = {2px (i) px (£) — px (€ + i) — px(l—i)}. 


The following result shows that the standard Bartlett formula always underestimates the asymptotic 
variances of the sample autocorrelations in presence of GARCH errors. 


Proposition 5.1 Under the assumptions of Theorem B.5, if the linear innovation process (€;) is a 
GARCH process with n, symmetrically distributed, then 


v;,>0 foralli>0. 
If, moreover, a > 0, Var(1?) >0 and yi px(h) 4 0, then 


h=—oo 
v;,>0 for alli >0. 
Proof. From Proposition 2.2, we have p,2(¢) > 0 for all £, with strict inequality when a, > 0. 


It thus follows immediately from (5.7) that v% > 0. When a; > 0 this inequality is strict unless if 
Ke = 1 or w;(£) = 0 for all £ > 1, that is, 


2px (i)px(l) = px(€ +1) + px(€ — i). 
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Suppose this relations holds and note that it is also satisfied for all £ € Z. Moreover, summing 
over £, we obtain 


2px(i) $ ox) =) px +i) + px(l - 1) =2) > px). 


Because the sum of all autocorrelations is supposed to be nonzero, we thus have px (i) = 1. Taking 
£ = i in the previous relation, we thus find that px (2i) = 1. Iterating this argument yields px (ni) = 
1, and letting n go to infinity gives a contradiction. Finally, one cannot have ke = 1 because 


Var(€?) = (Ee?) (ke — 1) = Eh?Var(n?) + Varh; > wVar(n7) > 0. 


Consider, by way of illustration, the ARMA(2,1)-GARCH(1, 1) process defined by 


Xt = 0.8X;—1 + 0.8X;—2 = 6&— 0.8€;_1 
Et =m, mm iid MO, 1) (5.8) 
o? = 1 + 0.2e? | + 0.607 ,. 


Figure 5.5 shows the theoretical autocorrelations and partial autocorrelations for this model. The 
bands shown as solid lines should contain approximately 95% of the SACRs and SPACs, for a 
realization of size n = 1000 of this model. These bands are obtained from formula (B.15), the 
autocorrelations of (e2) being computed as in Section 2.5.3. The bands shown as dotted lines 
correspond to the standard Bartlett formula (still at the 95% level). It can be seen that using this 
formula, which is erroneous in the presence of GARCH, would lead to identification errors because 
it systematically underestimates the variability of the sample autocorrelations (Proposition 5.1). 


Algorithm for Estimating the Generalized Bands 


In practice, the autocorrelations of (X,) and (e?), as well as the other theoretical quantities involved 
in the generalized Bartlett formula (B.15), are obviously unknown. We propose the following 
algorithm for estimating such quantities: 


1. Fit an AR(po) model to the data X1, ..., X, using an information criterion for the selection 
of the order po. 


2. Compute the autocorrelations pı(h), h = 1,2,..., of this AR(po) model. 


3. Compute the residuals €,)41,...,@n Of this estimated AR(po). 


2 
Potl?** 


2 
-p 


4. Fit an AR(pı) model to the squared residuals e 
criterion for pı. 


again using an information 


0.4 Ẹ 0.75 Ẹ 
0.2 E 0.5 È 
F ETE OF 
—0.2 F gan E 
-0.4 È -0.25 Ẹ 
-0.6 5 -0.5 E 
-0.8 -0.75 È 


Figure 5.5 Autocorrelations (left) and partial autocorrelations (right) for model (5.8). Approxi- 
mately 95% of the SACRs (SPACs) of a realization of size n = 1000 should lie between the bands 
shown as solid lines. The bands shown as dotted lines correspond to the standard Bartlett formula. 
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5. Compute the autocorrelations p2(h), h = 1,2,..., of this AR(p;) model. 


6. Estimate lim,-. >. nCov {ĝ(i), 6(j)} by îi; + OF , where 


Lmax 
y= D> aO LaWper Der -2DE j) 


l=— Lmax 


oi (Nole+ I+ n+ f-D+o€-j—-d]. 


A 2 (0) Lmax 5 
ô = a p2(l) [2p1(i)o1 (J) 07 () — 2p (Api (Cpi(e + i) 


l= — Lmax 


-2 Ma Onl+)+oe+I)lmE+ Hoa- j], 


n n 


> & -po PO = Yg, 


and Lmax is a truncation parameter, numerically determined so as to have |p1(£)| and |o2(£)| 
less than a certain tolerance (for instance, 1075) for all £ > Lmax- 


fe (0) = 


This algorithm is fast when the Durbin—Levinson algorithm is used to fit the AR models. Figure 5.6 
shows an application of this algorithm (using the BIC information criterion). 


5.2.2 Sample Autocorrelations of an ARMA-GARCH Process When 
the Noise is Not Symmetrically Distributed 

The generalized Bartlett formula (B.15) holds under condition (B.13), which may not be satisfied 

if the distribution of the noise 7;, in the GARCH equation, is not symmetric. We shall consider the 

asymptotic behavior of the SACVs and SACRs for very general linear processes whose innovation 

(€;) is a weak white noise. Retaining the notation of Theorem B.5, the following property allows 


the asymptotic variance of the SACRs to be interpreted as the spectral density at 0 of a vector 
process (see, for instance, Brockwell and Davis, 1991, for the concept of spectral density). Let 


Yo:m = ( (0), TET p(m))’. 


Theorem 5.3 Let (X;);ez be a real stationary process satisfying 


B= Ð wey, Yo Wil <oo, 


j=- j=—=00 
0.4 F i f 0.2 
0.2 F i A = F frees 
| alist, a pree 
Eps po Ug 3 “ars = -0.2¢ | h 
0.26 yi ; Hi E 
o4 E S H Ma —0.4 t ii ti 
—0.6 E $ —0.6 E H 


Figure 5.6 SACRs (left) and SPACs (right) of a simulation of size n = 1000 of model (5.8). The 
dotted lines are the estimated 95% confidence bands. 
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where (€;);ez is a weak white noise such that Ee <00; Let Te =X Xn X Xt4m)’, 
Py(h) = EY¥Y* „ and 


1 +00 i 
Fea) = > DD Trh), 


h=—0oo 


the spectral density of the process Y* = (Y;), YF = Y, — EY;. Then we have 


lim nVarfom := Ep. = 27 fy» (0). (5.9) 


Asoo YO:m 


Proof. By stationarity and application of the Lebesgue dominated convergence theorem, 


1 n 1 n 
VatyYo:m 1) = nC = Y*,- ye 
nVarfo.m + 0(1) nov E 2 1 5 :) 


n—1 ih] 
X (1 — a) Covi es) 
n 


h=—n+1 


+00 


> YO Trh) = 2r fr (0) 


h=—oo 


as A= 00, 


The matrix Xp., involved in (5.9) is called the long-run variance in the econometric literature, 
as a reminder that it is the limiting variance of a sample mean. Several methods can be considered 
for long-run variance estimation. 


(i) The naive estimator based on replacing the 'y(h) by the Py(h) in for (0) is inconsistent 
(Exercise 1.2). However, a consistent estimator can be obtained by weighting the Pethi 
using a weight close to 1 when h is very small compared to n, and a weight close to 0 
when A is large. Such an estimator is called heteroscedastic and autocorrelation consistent 
(HAC) in the econometric literature. 


Gi) A consistent estimator of fy»(0) can also be obtained using the smoothed periodogram (see 
Brockwell and Davis, 1991, Section 10.4). 


Gii) For a vector AR(r), 


: 
A,(B)Y, := Y, — D AiY;-; = Z;, (Z+) white noise with variance D7, 


i=l 


the spectral density at O is 


1 
frO = AM EZA. 
TT 


A vector AR model is easily fitted, even a high-order AR, using a multivariate version of 
the Durbin—Levinson algorithm (see Brockwell and Davis, 1991, p. 422). The following 
method can thus be proposed: 


1. Fit AR(r) models, with r = 0, 1..., R, to the data Yı — Y,,..., Yn-m — Yn where 
Yn = (n— m)! c Y}. 


2. Select a value ro by minimizing an information criterion, for instance the BIC. 
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3. Take 


È pom = An ay! Éa A a)! ’ 


ro 
with obvious notation. 


In our applications we used method (iii). 


5.2.3 Identifying the Orders (P, Q) 


Order determination based on the sample autocorrelations and partial autocorrelations in the mixed 
ARMA(P, Q) model is not an easy task. Other methods, such as the corner method, presented in 
the next section, and the epsilon algorithm, rely on more convenient statistics. 


The Corner Method 


Denote by D(i, j) the j x j Toeplitz matrix 


px (i) px(i — 1) me pxGi- j+1) 
px(i+ 1) 
Di, j) = 
px(i+j-— 1) E px(i +1) px (i) 


and let A(i, j) denote its determinant. Since px(h) = sj aipx(h — i) = 0, for all h > Q, it is 
clear that D(i, j) is not a full-rank matrix if i> Q and j> P. More precisely, P and Q are 
minimal orders (that is, (X;) does not admit an ARMA(P’, Q’) representation with P’ < P or 
Q' < Q) if and only if 


A(i, j)=0 Vi>@Q and Yj>P, 
Adi, P) 40 Vi=Q, (5.10) 
A(Q, j) #0 Vj = P. 


The minimal orders P and Q are thus characterized by the following table: 


(T1) 


where A(j, i) is at the intersection of row i and column j, and x denotes a nonzero element. 
The orders P and Q are thus characterized by a corner of zeros in table (T1), hence the term 
‘corner method’. The entries in this table are easily obtained using the recursion on j given by 


Ad, jP =AG+1, PAG—1, j +A, j+ DAG, j— 1), (5.11) 


and letting A(i, 0) = 1, AG, 1) = px (lil). 
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Denote by Dii, j). Adi, j), (TĪ), ... the items obtained by replacing {ox(h)} by {6x(A)} in 
Dii, j), AG, j), (T1), .... Only a finite number of SACRs 6x(1),..., 6x(K) are available in 
practice, which allows AG, i) to be computed fori > 1, j > 1 and i + j < K +1. Table (T1) 
is thus triangular. Because the A(j, i) consistently estimate the A(j,i), the orders P and Q are 
characterized by a corner of small values in table (T1). However, the notion of ‘small value’ in 
(T 1) is not precise enough.* 

It is preferable to consider the studentized statistics defined, for i= —K,...,K and j = 
0,...,K —|i| +1, by 


A 


ti, j)= Jn an ĉ 


ORG, j) 


DAHIL (5.12) 


>N 
Xu 


where $; g İS a consistent estimator of the asymptotic covariance matrix of the first K SACRs, 
which can be obtained by the algorithm of Section 5.2.1 or by that of Section 5.2.2, and where 


the Jacobian ned 1 — EE ad) is obtained from the differentiation of (5.11): 
dAG, 0) 
——=0 for i=—-K-—1l1,...,K—1 and k=1.,...,K; 
dpx (k) 
dAG, 1) 
—— = l (lil) fori=—K,...,K and k=1,...,K; 
ðpx (k) 
A ao. wy OAG,j Ros n 9AGi—1,j Ros ~ 3Â(i+1, j) 
Âli i+) _ 2ÂG, j) CD — Aq + 1, DER - ACi- 1, NES 
ðpx (k) Âli, j- 1) 
{AG 7? - AG +1, DAG- L See 
AAG j- 1) 


for k= 1,...,K,i =—K + j,...,K — j and j = 1,..., K. 

When A(i, j) = 0 the statistic t (i, j) asymptotically follows a (0, 1) (provided, in particular, 
that EX? exists). If, in contrast, A(i, j) Æ 0 then vnlt (i, J)| > œ a.s. when n > oo. We can 
reject the hypothesis A (i, j) = 0 at level a if |t (i, j)| is beyond the (1 — œ/2)-quantile of a N(0, 1). 
We can also automatically detect a corner of small values in the table, (T1) say, giving the t(i, j), 
if no entry in this corner is greater than this (1 — a/2)-quantile in absolute value. This practice 
does not correspond to any formal test at level a, but allows a small number of plausible values 
to be selected for the orders P and Q. 


Illustration of the Corner Method 


For a simulation of size n = 1000 of the ARMA(2,1)-GARCH(1, 1) model (5.8) we obtain the 
following table: 


Pr leds s Be ot Cu B eae Ae eB EHO vache T tae Oe ee Sead Ose ie 0 
1 17.6-31.6-22.6 -1.9 11.5 8.7 -0.1 -6.1 -4.2 0.5 3.5 2.1 
2 36.1 20.3 12.2 8.7 6.5 4.9 4.0 3.3 2.5 2.1 1.8 

3 -7.8 -1.6 -0.2 0.5 0.7 -0.7 0.8 -1.4 1.2 -1.1 

4 5.2 0.1 0.4 0.3 0.6 -0.1 -0.3 0.5 -0.2 

5 -3.7 0.4 -0.1 -0.5 0.4 -0.2 0.2 -0.2 

6 2.8 0.6 0.5 0.4 0.2 0.4 0.2 


3 Comparing A(i, j) and A(i’, j’^) for j # j’ (that is, entries of different rows in table (TD) is all the more 
difficult as these are determinants of matrices of different sizes. 
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7 | -2.0 -0.7 0.2 0.0 -0.4 -0.3 
8 | 1.7 0.8 0.0 0.2 0.2 

9 | -0.6 -1.2 -0.5 -0.2 

10 | 1.4 0.9 -0.2 

11 | -0.2 -1.2 

12 | 1.2 


A corner of values which can be viewed as plausible realizations of the M(0, 1) can be observed. 
This corner corresponds to the rows 3,4,... and the columns 2,3,..., leading us to select the 
ARMA(2, 1) model. The automatic detection routine for corners of small values gives: 


ARMA(P,Q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL 
PROBA CRIT MODELS FOUND 
0.200000 1.28 ( 2, 8) ( 3% 1) (10, 0 
0.100000 1.64 (2; 1) ( 8, 0) 
0.050000 1.96 ( 1,10) € 2,1) ( 7, 0 
0.020000 2233 ( 0,11) (1, 9) (2, 1 ( 6, 0) 
0.010000 2.58 ( 0,11) ( 1, 8) (2, 1 ( 6, 0) 
0.005000 2.81 €-0,£%) (1, 8) (2,4 (5, 0) 
0.002000 3.09 ( 0,21) (1, 8) (2, 1 (5, 0) 
0.001000 3.29 ( 0,11) (iy 8) (2, 1 (5, 0) 
0.000100 3.72 ( 0, 9) (1, 7) ( 2, 1 ( 5, 0) 
0.000010 4.26 ( 0, 8) ( 1, 6) C2 T ( 4, 0) 


We retrieve the orders (P, Q) = (2, 1) of the simulated model, but also other plausible orders. 
This is not surprising since the ARMA(2, 1) model can be well approximated by other ARMA 
models, such as an AR(6), an MA(11) or an ARMA(I, 8) (but in practice, the ARMA(2, 1) should 
be preferred for parsimony reasons). 


5.3 Identifying the GARCH Orders of an ARMA-GARCH 
Model 


The Box—Jenkins methodology described in Chapter 1 for ARMA models can be adapted to 
GARCH(p, q) models. In this section we consider only the identification problem. First suppose 
that the observations are drawn from a pure GARCH. The choice of a small number of plausible 
values for the orders p and g can be achieved in several steps, using various tools: 


(i) inspection of the sample autocorrelations and sample partial autocorrelations of E TO- 


n ; 
(ii) inspection of statistics that are functions of the sample autocovariances of e (corner method, 
epsilon algorithm, ...); 


(iii) use of information criteria (AIC, BIC, ...); 
(iv) tests of the significance of certain coefficients; 


(v) analysis of the residuals. 


Steps (iii) and (v), and to a large extent step (iv), require the estimation of models, and are 
used to validate or modify them. Estimation of GARCH models will be studied in detail in the 
forthcoming chapters. Step (i) relies on the ARMA representation for the square of a GARCH 
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process. In particular, if (€,) is an ARCH(q) process, then the theoretical partial autocorrelation 
function r,2(-) of Ga satisfies 


ra(h)=0 Yh >q. 


For mixed models, the corner method can be used. 


5.3.1 Corner Method in the GARCH Case 


To identify the orders of a GARCH(p, q) process, one can use the fact that (Ge ) follows an 
ARMA(P, Q) with P = max(p,q) and Õ = p. In the case of a pure GARCH, (e;) = (X;) is 
observed. The asymptotic variance of the SACRs of e, pisia é can be estimated by the method 
described in Section 5.2.2. The table of studentized statistics for the corner method follows, as 
described in the previous section. The problem is then to detect at least one corner of normal 
values starting from the row P +1 and the column Õ + 1 of the table, under the constraints 
P > 1 (because max(p, q) = >q > 1) and P > Ø. This leads to selection of o q) mod- 
els such that (p,q) = (Q, P) when Q < P and (p,q) = (Q, 1), (p,q) = (Õ. D, ...(p,9) = 
(Õ, P) when Õ > P. 

In the ARMA-GARCH case the €, are unobserved but can be approximated by the ARMA 
residuals. Alternatively, to avoid the ARMA estimation, residuals from fitted ARs, as described in 
steps 1 and 3 of the algorithm of Section 5.2.1, can be used. 


5.3.2 Applications 
A Pure GARCH 
Consider a simulation of size n = 5000 of the GARCH(2, 1) model 


(5:13) 


Et = Ott 
2 
of =w +e 


1 + Bio?) + fro? >, 


where (n+) is a sequence of iid V(0, 1) variables, œw = 1, a = 0.1, 8} = 0.05 and py = 0.8 
The table of studentized statistics for the corner method is as follows: 


smax(p/q): N S s aE E 0 c8eesckeas obec EREE es oa T ETE O T h Ldea ce LS ED Se as 
L 5.3 2.9 51 2.2 53° 59° 3.6: SLT 2.9 2.9 3.4 1.4 3.8 24 30 
2 -2.4 -3.5 2.4 -4.4 2.2 -0.7 0.6 -0.7 -0.3 0.4 1.1 -2.5 2.8 -0.2 
3 4.9 2.4 0.7 1.7 0.7 -0.8 0.2 0.4 0.3 0.3 0.7 1.4 1.4 
4 -0.4 -4.3 -1.8 -0.6 1.0 -0.6 0.4 -0.4 0.5 -0.6 oA =i 
5 4.6 2.4 0.6 0.9 0.8 0.5 0:3 -0.4 -0.5 0.5 -8 
6 =3.1 -17 1,4 =028 -0.3 0.3 0:3 =0.5 0.5 0.4 
F 3.1 1.2 0:3 0.6 0.3 0.2 0.5 0.1 -0.7 
8 -1.0 -1.3 -0.7 -0.5 0.8 -0.5 0.3 -0.6 
9 LS OS. Os OF HOS. S = 07 
10 =1.7 0.1 0.3 -0.7 -0.6 0.5 
11 1.8 2.2 0:6 0.7 -1.0 
12 1.6 -13 -1.4 -1,1 
13 4.2 2.3. 1.4 
14 =1.2 -0.6 
15) -4 


A corner of plausible M(0, 1) values is observed starting from the row P + 1 = 3 and the column 


Õ + 1 = 3, which corresponds to GARCH(p, q) models such that (max(p, q), p) = 


(2, 2), that 
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is, (p,q) = (2, 1) or (p,q) = (2,2). A small number of other plausible values are detected for 
(p. 4). 


GARCH (p,q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL 
PROBA CRIT MODELS FOUND 
0.200000 1.28 C3 1) (3% 2) (3 33 ( 2713) 
0.100000 1.64 (3 2) € 34-2) (34-3 ( 2, 4) ( 0,13) 
0.050000 1.96 (2, 2) (2, 2) ( 0,13 
0.020000 2.33 (2252) (2, 2) (1, 5 ( 0,13) 
0.010000 2.58 ( 2). 13 C257 2) (1, 4 (0,13) 
0.005000 2.81 C2; 1) (2y 2) (1,4 ( 0,13) 
0.002000 3.09 (220. Al) (2, 2) (1, 4 (0313) 
0.001000 3.29 (2-1) l 2a) (1, 4 ( 05.13) 
0.000100 3572 (2 1) (2 2) (1, 4 C013) 
0.000010 4.26 2r 1) ( 2, 2) (1, 4 (0; 5) 


An ARMA-GARCH 


Let us resume the simulation of size n = 1000 of the ARMA(2, 1)-GARCH(1, 1) model (5.8). The 
table of studentized statistics for the corner method, applied to the SACRs of the observed process, 
was presented in Section 5.2.3. A small number of ARMA models, including the ARMA(2, 1), 
was selected. Let e14,),..., €n denote the residuals when an AR(po) is fitted to the observations, 
the order po being selected using an information criterion.4 Applying the corner method again, 


but this time on the SACRs of the squared residuals e? Fpp e2, and estimating the covariances 


s Eno 


between the SACRs by the multivariate AR spectral approximation, as described in Section 5.2.2, 
we obtain the following table: 


martra Alls tela cen as wer Sacred wand a ee MRO Sas Dew ai Oren Oe are Org gach diy, san OP. coe 
1| 4.5 .1 3:5 2.1 1.1 2.1 1.2 1.0 0.7 0.4 -0.2 0.9 
2 | -2.7 0.3 -0.2 0.1 -0.4 0.5 -0.2 0.2 -0.1 0.4 -0.2 
3 | 1.4 -0.2 0.0 -0.2 0.2 0.3 -0.2 0.1 -0.2 0.1 
4| -0.9 0.1 0.2 0.2 -0.2 0.2 0.0 -0.2 -0.1 
5 0.3 -0.4 0.2 -0.2 0. 0.1 -0.1 0.1 
6 | -0.7 0.4 -0.2 0.2 -0.1 0.1 -0.1 
7| 0.0 -0.1 -0.2 0.1 -0.1 -0.2 
8 -0.1 0.1 -0.1 -0.2 -0.1 
9 | -0.3 0.1 -0.1 -0.1 
10 0.1 -0.2 -0.1 
11 | -0.4 0.2 
12 -1.0 


A corner of values compatible with the M(0, 1) is observed starting from row 2 and column 2, 
which corresponds to a GARCH(1, 1) model. Another corner can be seen below row 2, which 
corresponds to a GARCH(0, 2) = ARCH(2) model. In practice, in this identification step, at least 
these two models would be selected. The next step would be the estimation of the selected models, 
followed by a validation step involving testing the significance of the coefficients, examining the 
residuals and comparing the models via information criteria. This validation step allows a final 
model to be retained which can be used for prediction purposes. 


4 One can also use the innovations algorithm of Brockwell and Davis (1991, p. 172) for rapid fitting of 
MA models. Alternatively, one of the previously selected ARMA models, for instance the ARMA(2, 1), can 
be used to approximate the innovations. 
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GARCH (p,q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL 


PROBA CRIT MODELS FOUND 
0.200000 1.28 Cap) (0, 3) 
0.100000 1.64 (Ay 1) ( 0, 2) 
0.050000 1.96 Caly £) (0, 2) 
0.020000 42.33 (ay 2) (0, 2) 
0.010000 2.58 (1,2) ( 0, 2) 
0.005000 2.81 C Oz L) 

0.002000 3.09 (0, 1) 
0.001000 329 ( 0; I) 
0.000100 3.72 (0, 1) 
0.000010 4.26 ( 0; T) 


5.4 Lagrange Multiplier Test for Conditional 
Homoscedasticity 


To test linear restrictions on the parameters of a model, the most widely used tests are the Wald 
test, the Lagrange multiplier (LM) test and likelihood ratio (LR) test. The LM test, also referred 
to as the Rao test or the score test, is attractive because it only requires estimation of the restricted 
model (unlike the Wald and LR tests which will be studied in Chapter 8), which is often much 
easier than estimating the unrestricted model. We start by deriving the general form of the LM 
test. Then we present an LM test for conditional homoscedasticity in Section 5.4.2. 


5.4.1 General Form of the LM Test 


Consider a parametric model, with true parameter value 6) € Rf, and a null hypothesis 
Ho : R00 =r, 


where R is a given s x d matrix of full rank s, and r is a given s x 1 vector. This formulation 
allows one to test, for instance, whether the first s components of 90 are null (it suffices to set 
R= [Js : Osx(a—sy] and r = 05). Let £,(@) denote the log-likelihood of observations X1,..., Xn. 
We assume the existence of unconstrained and constrained (by Ho) maximum likelihood estimators, 
respectively satisfying 


6 =argsupé,(9) and ĝe =arg sup £,(8). 
0 0: RO=r 


Under some regularity assumptions (which will be discussed in detail in Chapter 7 for the 
GARCH(p, q) model) the score vector satisfies a central limit theorem and we have 


1 ə 

Jn 00 

where J is the Fisher information matrix. To derive the constrained estimator we introduce the 
Lagrangian 


£,(0) > N(O,3) and aÊ- o) S N (0, 57), (5.14) 


L(O, A) = £,(0) — (RO — r). 


We have 


(6°, 4) = arg sup L(O, A). 
(0,A) 
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The first-order conditions give 


a A A 
R'À = — l, (0°) and ROF =r: 
00 


The second convergence in (5.14) thus shows that under Ho, 
VnR(6 — 6°) = JnR@ — %) S N(0, RI'R’) . (5.15) 


Using the convention a = b for a = b + c, asymptotic expansions entail, under usual regularity 
conditions (more rigorous statements will be given in Chapter 7), 


Í a op(l) 1 9 A 
0 = — — 4, (6) = —=—£,(%) -7I 6 — 0o), 
= 70 (0) Tn 00 (80) Vn( o) 
1 ə PON 1 ə eed 
—£,,(6°) “2 ——2, (9) — 3/n(6° — 6), 


Vn 06 Vn 00 


Jn@ 365 P es) Se. (5.16) 
Finally, (5.15) and (5.16) imply 
ve E , 2 
T 2) (RITR) | VaR — 0) 5 N{0, (RI'R) |} 
n 


and then 


Thus, under Apo, the test statistic 


LM, = —0 RI R'A = 26, 693-12 4,6) (5.17) 
n n 00’ 00 
asymptotically follows a x2, provided that 5 is an estimator converging in probability to J. In 
general one can take 
1 074,(0°) 
n 3000" ` 


The critical region of the LM test at the asymptotic level a is {LM, > xa —a)}. 


The Case where the LM, Statistic Takes the Form n R? 


Implementation of an LM test can sometimes be extremely simple. Consider a nonlinear con- 
ditionally homoscedastic model in which a dependent variable Y; is related to its past values 
and to a vector of exogenous variables X; by Y, = Fo,(W;) + €r, where e, is iid (0, og) and 


W, = (X;, Y;-1,...). Assume, in addition, that W, and €, are independent. We wish to test the 
hypothesis 

Ho : Yo = 0 
where 


m=( i), Boe RIS, WeR’. 
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To retrieve the framework of the previous section, let R = [Osxa-s) : L] and note that 


ə A ð 7 x l a8 
—,(0°) = R—L,(0°9) = RRX=K and —À 0, £1), 
Iy (0°) 30 (6°) an fi +> N0, £1) 


where ©, = (2)! and J? = RI~!R’ is the bottom right-hand block of 3~!. Suppose that 


ie = a? (Bo) does not depend on Wo. With a Gaussian likelihood (Exercise 5.9) we have 


1 v ye 
ee, om Fan Lge yp Foe) = Tae US, 


where €,(9) = Y, — Fa(W,), 6°? = 07(B*), & = e (°), US = (ef, ..., êc) and 


‘i (=m. =I 
ie ia i, 
ay a 


A J J 
z=( 3" 2 \ 
Jı 522 
where Jį; and 322 are square matrices of respective sizes d — s and s. Under the assumption that the 


information matrix J is block-diagonal (that is, J12 = 0), we have J? = To where Ja = RIR’, 
which entails ©, = J22. We can then choose 


Partition J into blocks as 


as 1 lava 1 
n= -U u Le, F es a F 
À 6c4 n y Ôc Ôe yEy 


as a consistent estimator of £}. We end up with 


zi EET 
O'Fy (F Fy) F/O 
LM, Spe II; (5.18) 
CC 


which is nothing other than n times the uncentered determination coefficient in the regression of 
ês on the variables 0 Fy-(W;)/dw; for i = 1,..., 8 (Exercise 5.10). 


LM Test with Auxiliary Regressions 


We extend the previous framework by allowing Jı? to be not equal to zero. Assume that o? does 
not depend on 6. In view of Exercise 5.9, we can then estimate ©, by? 


i 1 1 
= = (F F FF, (FF FF, ), 
A tebe te yo Ey  ( B B) B v) 


where 
i, on Ea i; 
pe ap ap 


Suppose the model is linear under the constraint Ho, so that 
=Y- Ff: and 6°? =0°U/n 


Ai Ai 
An An 


-1 
the bottom right-hand block of A™ is written as A? = (Ax — An Aj, lA 12) (Exercise 6.7). 


5 For a partitioned invertible matrix A = ( | where Aj; and An are invertible square blocks, 
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with f 
=(Y,...,¥,)' and po= (FFs) FY, 


up to some negligible terms. 
Now consider the linear regression 


Y=Fs6*+Fyyw* +U. (5.19) 
Exercise 5.10 shows that, in this auxiliary regression, the LM statistic for testing the hypothesis 
Hy: w* =0 
is given by 
LM* = n`! (6°) “0° F, Ôt IF 0° 
= (ê)? ÔF, (F Fy — FFs (FsFs) F,Fy) FU. 


This statistic is precisely the LM test statistic for the hypothesis Hp : y = 0 in the initial model. 

From Exercise 5.10, the LM test statistic of the hypothesis Hj : w* = 0 in model (5.19) can also 
be written as aa —_ 
us Us — UU 

LM* = n— y (5.20) 


where U = Y — F; ĝ* — Fy, * =: Y — FO*, with ĝ* = (FF) FY. We finally obtain the so- 
called Breusch—Godfrey form of the LM statistic by interpreting LM* in (5.20) as n times the 
determination coefficient of the auxiliary regression 


Oo =Fey + Fyv*+V, (5.21) 


where Û' is the vector of residuals in the regression of Y on the columns of Fz. 

Indeed, in the two regressions (5.19) and (5.21), the vector of residuals is v =U, because 
B* = Bo +9 and ~* = W™. Finally, we note that the determination coefficient is centered (in 
other words, it is R? as provided by standard statistical software) when a column of Fg is constant. 


Quasi-LM Test 


When £,,(@) is no longer supposed to be the log-likelihood, but only the quasi-log-likelihood 
(a thorough study of the quasi-likelihood for GARCH models will be made in Chapter 7), the 
equations can in general be replaced by 


1 


A ‘E z = 
Saag tn) 5 NO, I) and Jn(6 — %) + N (0, J ry "ys (5.22) 


where 
2 


1 a 
i= li ly, te 0; J= lim -——£,0 S. 
oe og ene 
It is then recommended that (5.17) be replaced by the more complex, but more robust, expression 


lappi f? =l -İp 
LM, = A’ RJ R (RI-'PF OR’) RIT RE 


1 aiaa E E) Ac 
= LÊ OA -IR (RIISI RY] RI 20,65), (5.23) 
n 00’ 00 
where Î and J are consistent estimators of J and J. A consistent estimator of J is obviously 
obtained as a sample mean. Estimating the long-run variance J requires more involved methods, 

such as those described on page 105 (HAC or other methods). 
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5.4.2 LM Test for Conditional Homoscedasticity 

Consider testing the conditional homoscedasticity assumption 
Ho : &o1 = +++ = Ag = 0 

in the ARCH(q) model 


E& =O, n iid (0, 1) 

o? =aot+ V7, aie? _;, ao>0, Qoi > 0. 
At the parameter value 0 = (w, a,...,@,) the quasi-log-likelihood is written, neglecting unim- 
portant constants, as 


2 


1 n q 
£,(0) = eee zg + Eee Oe 070) =a+)> me? ;, 


i=] 


with the convention €,_; = 0 for t < 0. The constrained quasi-maximum likelihood estimator is 


6° = (0°,0,...,0), where ô° =07(6°) = ae 
n = 


At 09 = (wo, ..., 0), the score vector satisfies 
1 00,(6 d07(6 
i (20) _ aa — 02 (0)} of (9) 
Jn 96 me =a! 00 
1 
2 
1 <1 aa 
= — (n1) 
2/n f= Wo K 
Gs 
—1 
Š NOD, 1=% oe 
4a, o Mm 
under Ho, where wo = (wọ, ..., @0)' € R1 and hz is a matrix whose diagonal elements are WK y 


with k, = E ni, and whose other entries are equal to wh. The bottom right-hand block of I~! is 
thus 


405 yl 405 =i 4 
12 = - = {122 — wow} = - A fn- Dobl} = mop (5.24) 
n n n 


In addition, we have 


2 n 2 2 
10° £n (00) z = 2 [2e — 02(60)} 00; (00) 00; (80) 


n 0000" — 2n — oS) 00.00" 
i 
= me f (2n? — 1) . (se cate 
ey 
o> J= — if a.s 


6 Indeed, the function o° +> x/o? +n logo? reaches its minimum at o? = x/n. 
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From (5.23), using estimators of J and J such that J= 2/(Ky — ni , we obtain 


2 Ve er 1 ð PEE. è 
LM, = = —WRISRG = -— t, (ôC) Î ge ). 
Ry= 1n n 00! 
Using (5.24) and noting that 
e2 
n -1 
D 0 Iaa LA i e ‘ 
—£,,(0° = f Âc ey — Ll, (8° S e —-1 : > 
a9"? ( ZnO) ) ga a 24 0 (£ ) k 
= L, 
we obtain 
2 
1a ð i 1 wae e 
LM, = — abn 6° P t 6°) = z jja. 5.25 
(6) 10°) = va mI z (5.25) 
Equivalence with a Portmanteau Test 
Using 
n e 1 n 2 2 
—-1)=0 d — —-1) =4,-1, 
roto me (eI) =% 
t=1 t=1 
it follows from (5.25) that 
q 
LM, =n ) ôo (h), (5.26) 


h=1 


which shows that the LM test is equivalent to a portmanteau test on the squares. 


Expression in Terms of R? 


To establish a connection with the linear model, write 
a Aes ee 
— tł, 0) =n XY, 
30 


where Y is the n x 1 vector 1 — e /@°, and X is the n x (q + 1) matrix with first column 1/2ô° 
and (i + 1)th column e€? ;/2ô°. Estimating J by (&, — 1)n7!X'X, where &, — 1 =n !Y'Y, we 
obtain 
Y'X(X X XY 
LM, = n——————_-, (5.27) 
Y'Y 

which can be interpreted as n times the determination coefficient in the linear regression of Y 
on the columns of X. Because the determination coefficient is invariant by linear transformation 
of the variables (Exercise 5.11), we simply have LM, = nR? where R? is the determination 
coefficient” of the regression of é on a constant and q lagged variables e? pretig e q; Under the 
null hypothesis of conditional homoscedasticity, LM, asymptotically follows a x2. The version of 
the LM statistic given in (5.27) differs from the one given in (5.25) because (5.24) is not satisfied 
when J is replaced by (Ê, — 1)n7!X'X. 


7 We mean here the centered determination coefficient (the one usually given by standard software) not the 
uncentered one as was the case in Section 5.4.1. There is sometimes confusion between these coefficients in 
the literature. 
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5.5 Application to Real Series 


Consider the returns of the CAC 40 stock index from March 2, 1990 to December 29, 2006 (4245 
observations) and of the FTSE 100 index of the London Stock Exchange from April 3, 1984 
to April 3, 2007 (5812 observations). The correlograms for the returns and squared returns are 
displayed in Figure 5.7. The bottom correlograms of Figure 5.7, as well as the portmanteau tests 
of Table 5.4, clearly show that, for the two indices, the strong white noise assumption cannot 
be sustained. These portmanteau tests can be considered as versions of LM tests for conditional 
homoscedasticity (see Section 5.4.2). Table 5.5 displays the n R? version of the LM test of Section 
5.4.2. Note that the two versions of the LM statistic are quite different but lead to the same 
unambiguous conclusions: the hypothesis of no ARCH effect must be rejected, as well as the 
hypothesis of absence of autocorrelation for the CAC 40 or FTSE 100 returns. 

The first correlogram of Figure 5.7 and the first part of Table 5.6 lead us to think that the 
CAC 40 series is fairly compatible with a weak white noise structure (and hence with a GARCH 
structure). Recall that the 95% significance bands, shown as dotted lines on the upper correlograms 
of Figure 5.7, are valid under the strong white noise assumption but may be misleading for weak 
white noises (such as GARCH). The second part of Table 5.6 displays classical Ljung—Box tests 
for noncorrelation. It may be noted that the CAC 40 returns series does not pass the classical 
portmanteau tests. This does not mean, however, that the white noise assumption should be 


CAC Returns FTSE Returns 
+ + 
fo} o 
20 20 
2o Q2 o 
T T 
ey ea 
5S 5o 
[S] (S 
g- £e- 
zo Ep 
oO oO 
o o 
T l i, ul l TT l 
0 5 10 15 20 25 30 35 
Squared CAC Returns Squared FTSE Returns 
<+ + 
fo) o 
o? ony 
co co 
2 ka 
Ba Ba 
Go o] 
a] 6 
oR ©! as 
O . O -4 
5o 59 
< < 
oO oO 
(= fo} 
T T = T Ea T T T T T 
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Figure 5.7 Correlograms of returns and squared returns of the CAC 40 index (March 2, 1990 to 
December 29, 2006) and the FTSE 100 index (April 3, 1984 to April 3, 2007). 


8 Classical portmanteau tests are those provided by standard commercial software, in particular those of 
the table entitled ‘Autocorrelation Check for White Noise’ of the ARIMA procedure in SAS. 
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Table 5.4 Portmanteau tests on the squared CAC 40 returns (March 2, 1990 to December 29, 
2006) and FTSE 100 returns (April 3, 1984 to April 3, 2007). 


Tests for noncorrelation of the squared CAC 40 


m 1 2 3 4 5 6 
b.2(m) 0.181 0.226 0.231 0.177 0.209 0.236 
Ôp a (m) 0.030 0.030 0.030 0.030 0.030 0.030 
Ha 138.825 356.487 580.995 712.549 896.465 1133.276 
p-value 0.000 0.000 0.000 0.000 0.000 0.000 
m 7 8 9 10 11 12 
p.2(m) 0.202 0.206 0.184 0.198 0.201 0.173 
Ôp a (m) 0.030 0.030 0.030 0.030 0.030 0.030 
LE 1307.290 1486.941 1631.190 1798.789 1970.948 2099.029 
p-value 0.000 0.000 0.000 0.000 0.000 0.000 


Tests for noncorrelation of the squared FTSE 100 


m 1 2 3 4 5 6 

ĝa (m) 0.386 0.355 0.194 0.235 0.127 0.161 

55 5 0m) 0.026 0.026 0.026 0.026 0.026 0.026 
LB 867.573 1601.808 1820.314 2141.935 2236.064 2387.596 

p-value 0.000 0.000 0.000 0.000 0.000 0.000 

m 7 8 9 10 11 12 

ĝa (m) 0.160 0.151 0.115 0.148 0.141 0.135 

Ô a(n) 0.030 0.030 0.030 0.030 0.030 0.030 

QLB 964.803 1061.963 1118.258 1211.899 1296.512 1374.324 

p-value 0.000 0.000 0.000 0.000 0.000 0.000 


Table 5.5 LM tests for conditional homoscedasticity of the CAC 40 and FTSE 100. 
Tests for absence of ARCH for the CAC 40 


m 1 2 3 4 5 6 7 8 9 
LM, 138.7 303.3 421.7 451.7 500.8 572.4 600.3 621.6 629.7 
p-value 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 


Tests for absence of ARCH for the FTSE 100 


m 1 2 3 4 3 6 7 8 9 
LM, 867.1 1157.3 1157.4 1220.8 1222.4 1236.6 1237.0 1267.0 1267.3 
p-value 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 


rejected. Indeed, we know that such classical portmanteau tests are invalid for conditionally hetero- 
scedastic series. 

Table 5.7 is the analog of Table 5.6 for the FTSE 100 index. Conclusions are more disputable 
in this case. Although some p-values of the upper part of Table 5.7 are slightly less than 5%, 
one cannot exclude the possibility that the FTSE 100 index is a weak (GARCH) white noise. 
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Table 5.6 Portmanteau tests on the CAC 40 (March 2, 1990 to December 29, 2006). 


Tests of GARCH white noise based on Qm 


m 1 2 3 4 5 6 7 8 

p(m) 0.016 —0.020 —0.045 0.015 —0.041 —0.023 —0.025 0.014 
6pm) 0.041 0.044 0.044 0.041 0.043 0.044 0.042 0.043 
Qn 0.587 1.431 5.544 6.079 9.669 10.725 12.076 12.475 
p-value 0.443 0.489 0.136 0.193 0.085 0.097 0.098 0.131 
m 9 10 11 12 13 14 15 16 

p(m) 0.000 0.011 0.010 —0.014 0.020 0.024 0.037 0.001 
6pm) 0.041 0.042 0.042 0.041 0.043 0.040 0.040 0.040 
On 12.476 12.718 12.954 13.395 14.214 15.563 18.829 18.833 


p-value 0.188 0.240 0.296 0.341 0.359 0.341 0.222 0.277 
Usual tests for strong white noise 


m 1 2 3 4 5 6 7 8 


p(m) 0.016 —0.020 —0.045 0.015 —0.041 —0.023 —0.025 0.014 
6pm) 0.030 0.030 0.030 0.030 0.030 0.030 0.030 0.030 
o 1.105 2.882 11.614 12.611 19.858 22.134 24.826 25.629 
p-value 0.293 0.237 0.009 0.013 0.001 0.001 0.001 0.001 
m 9 10 11 12 13 14 15 16 

p(m) 0.000 0.011 0.010 —0.014 0.020 0.024 0.037 0.001 
Spm) 0.030 0.030 0.030 0.030 0.030 0.030 0.030 0.030 
gts 25.629 26.109 26.579 27.397 29.059 31.497 37.271 37.279 
p-value 0.002 0.004 0.005 0.007 0.006 0.005 0.001 0.002 


Table 5.7 Portmanteau tests on the FTSE 100 (April 3, 1984 to April 3, 2007). 


Tests of GARCH white noise based on Qm 


m 1 2 3 4 5 6 7 8 

p(m) 0.023 —0.002 —0.059 0.041 —0.021 —0.021 —0.006 0.039 
6pm) 0.057 0.055 0.044 0.047 0.039 0.042 0.037 0.042 
Qn 0.618 0.624 7.398 10.344 11.421 12.427 12.527 15.796 
p-value 0.432 0.732 0.060 0.035 0.044 0.053 0.085 0.045 
m 9 10 11 12 13 14 15 16 

p(m) 0.029 0.000 0.019 —0.003 0.023 —0.013 0.019 —0.022 
6pm) 0.036 0.041 0.038 0.037 0.037 0.036 0.035 0.039 
Qn 18.250 18.250 19.250 19.279 20.700 21.191 22.281 23.483 


p-value 0.032 0.051 0.057 0.082 0.079 0.097 0.101 0.101 
Usual tests for strong white noise 


m 1 2 3 4 5 6 7 8 


p(m) 0.023 —0.002 —0.059 0.041 —0.021 —0.021 —0.006 0.039 
6pm) 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.026 
ol 3.019 3.047 23.053 32.981 35.442 38.088 38.294 47.019 
p-value 0.082 0.218 0.000 0.000 0.000 0.000 0.000 0.000 
m 9 10 11 12 13 14 15 16 

p(m) 0.029 0.000 0.019 —0.003 0.023 —0.013 0.019 —0.022 
6pm) 0.026 0.026 0.026 0.026 0.026 0.026 0.026 0.026 
QZB 51.874 51.874 54.077 54.139 57.134 58.098 60.173 62.882 


p-value 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
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Table 5.8 Studentized statistics for the corner method for the CAC 40 series and selected 
ARMA orders. 


Die EEE OE TEES TE. TERE: VERES TETA LEE: CEES EE S ERS DD cl PE E T E PEE e 
1 0.8 -0.9 -2.0 0.7 -1.9 -1.0 -1.2 0.6 0.0 0.5 0.5 =0.7 0.9 42,2 1.8 
2 0.9 0.8 1.1 =L. 1.1 =0.3 0.8 0.27 -0.4 0.2 0.4 0.0 O17 =0.1 

3 =2.0. 1.1 =0.9 -1.0 =0.6 0.8 -0.5 0.4 -0.1 0.3 0.3 0.4 0.5 

4 0.8 =1.1 1:0 -0.4 0.7 =0.5 0.2 0.4 0.4 0.3 =0.2 =0.3 

5 -2.0 1.1 -0.6 0.7 -0.6 0.3 -0:3 0:2 0.0 0.3 23 

6 1.0 =0.3 -0.8 =0.5 -0.3 0.2 0.3 0.1 -0.2 0.3 

7 s11 0.7 -0.4 0.20.3. 03 03 M3 03 

8 =0.4 0.0 -0.3 0.3 =0.1 -0.1 =0.3 0.4 

9 -0.1 -0.2 -0,1 0.3 -0.1 0.2 -0.3 
10 -0.4 0.2 -0.3 0.2 -0.3 0.3 
11 0.5 0.4 0.2 =0.1 0.2 
12 0.8 0.1 =0,3 =0.3 
13 1.0 0.8 -0.5 
14 =1,1 -0.2 
15 1.8 


ARMA (P,Q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL 
PROBA CRIT MODELS FOUND 


0.200000 1.28 Cai 
0.100000 1.64 (Iz A) 
0.050000 1.96 (0, 3) (a ay (Sy. 101) 
0.020000 2.33 ( 0, 0) 
0.010000 2.58 ( 0, 0) 
0.005000 2.81 (0, 0) 
0.002000 3.09 ( 0, 0) 
0.001000 3.29 (0, 0) 
0.000100 3.72 (0, 0) 
0.000010 4.26 ( 0, 0) 


On the other hand, the assumption of strong white noise can be categorically rejected, the 
p-values (bottom of Table 5.7) being almost equal to zero. Table 5.8 confirms the identification 
of an ARMA(O, 0) process for the CAC 40. Table 5.9 would lead us to select an ARMA(O, 0), 
ARMA(I, 1), AR(3) or MA(3) model for the FTSE 100. Recall that this a priori identification 
step should be completed by an estimation of the selected models, followed by a validation 
step. For the CAC 40, Table 5.10 indicates that the most reasonable GARCH model is simply 
the GARCH(1, 1). For the FTSE 100, plausible models are the GARCH(2, 1), GARCH(2, 2), 
GARCH(2, 3), or ARCH(4), as can be seen from Table 5.11. The choice between these models 
is the object of the estimation and validation steps. 


5.6 Bibliographical Notes 


In this chapter, we have adapted tools generally employed to deal with the identification of 
ARMA models. Correlograms and partial correlograms are studied in depth in the book by 
Brockwell and Davis (1991). In particular, they provide a detailed proof for the Bartlett for- 
mula giving the asymptotic behavior of the sample autocorrelations of a strong linear process. 
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Table 5.9 Studentized statistics for the corner method for the FTSE 100 series and selected 
ARMA orders. 


A A Ose Dine eerste oe EER EELE wo Bie Og 10. ts le 13 14 15 
i 0.8 -0.1 -24.6 1.7 -1,0 -1.0 -0.3 1.8 1.6 0.0 1,1 -0.2 1,3 -0.7 1.2 
2 0.1 0:8 1.2 0.2 1.0 0.3 0.9 1:0 0.6 -0.9 0.5 =0,.8 0.6 =0.4 
3 =2.6 2.2 =0.7 -0.6 =0:7 0:8 =0.4 0:5 0.7 0.3 =0.3 =0.1 0.2 
4 =1.8 0.3 0.6 -0.7 026 0.0 -=0.4 0.4 0.6 =0.3 0.2 =0.1 
5 =1.1. 1.1 =0.7 0.6 -0.6 0:5 -0.3 0.5 0.5 0.1 0:2 
6 -1 0.5 -0.8 0.2 -0.4 0.6 0.5 0.5 0.4 0.2 
7 0.0 0.9 -0.2 =0.3 0.0 0.5 0.5 0.4 0.3 
8 =1.6 0.7 -0.3 0.2 -0.4 0.4 =0.4 0.3 
9 -4 0.5 0.6 0.5 0.4 0.3 0.2 

10 0.0 -0.9 -0.4 -0.2 -0.1 0.0 

11 2 0.6 0.0 D.O 0.1 

12 0.2 =0.8 0.0 D.D 

13 ‘3 0:36. 0.1 

14 0.5 -0.6 

15 E 


ARMA (P,Q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL 


PROBA CRIT MODELS FOUND 
0.200000 1.28 ( 0,13) (T 2) ( 9, 0) 
0.100000 1.64 (0, 8) (4, a} ( 4, 0) 
0.050000 1.96 (0, 3) (1; 1) ( 3; 0) 
0.020000 2.33 (0, 3) (By. a) (3, 0) 
0.010000 2.58 (0, 3) (4, 3} ( 3, 0) 
0.005000 2.81 (0, 0) 

0.002000 3.09 (0, 0) 
0.001000 3.29 (0, 0) 
0.000100 3.72 (0, 0) 
0.000010 4.26 (0, 0) 


The generalized Bartlett formula (B.15) was established by Francq and Zakoian (2009d). The 
textbook by Li (2004) can serve as a reference for the various portmanteau adequacy tests, as 
well as Godfrey (1988) for the LM tests. It is now well known that tools generally used for 
the identification of ARMA models should not be directly used in presence of conditional het- 
eroscedasticity, or other forms of dependence in the linear innovation process (see, for instance, 
Diebold, 1986; Romano and Thombs, 1996; Berlinet and Francq, 1997; or Francq, Roy and 
Zakoian, 2005). The corner method was proposed by Béguin, Gouriéroux and Monfort (1980) 
for the identification of mixed ARMA models. There are many alternatives to the corner method, 
in particular the epsilon algorithm (see Berlinet, 1984) and the generalized autocorrelations of 
Glasbey (1982). 

Additional references on tests of ARCH effects are Engle (1982, 1984), Bera and Higgins 
(1997) and Li (2004). 

In this chapter we have assumed the existence of a fourth-order moment for the observed 
process. When only the second-order moment exists, Basrak, Davis and Mikosch (2002) 
showed in particular that the sample autocorrelations converge very slowly. When even the 
second-order moment does not exist, the sample autocorrelations have a degenerate asymptotic 
distribution. 
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Table 5.10 Studentized statistics for the corner method for the squared CAC 40 series and 
selected GARCH orders. 
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13) 

13) 

13) 

13) 

13) 

6) 

6) 


Concerning the HAC estimators of a long-run variance matrix, see, for instance, Andrews 
(1991) and Andrews and Monahan (1992). The method based on the spectral density at 0 of an 
AR model follows from Berk (1974). A comparison with the HAC method is proposed in den 
Hann and Levin (1997). 


5.7 Exercises 


5.1 (Asymptotic behavior of the SACVs of a martingale difference) 
Let (€,) denote a martingale difference sequence such that EF ef <œ and y(h) = 
n! ye €€rn. By applying Corollary A.1, derive the asymptotic distribution of n'/2-9(h) 
for h 4 0. 


5.2 (Asymptotic behavior of n! p (1) for an ARCH(1) process) 
Consider the stationary nonanticipative solution of an ARCH(1) process 


= 2 
of =w +E], 


| ES OtNt (5.28) 
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Table 5.11 Studentized statistics for the corner method for the squared FTSE 100 series and 
selected GARCH orders. 


smax(/'O).3 M = PN O Qes cede acter Bas 2 CEOE EEES.: SETAS- TE S eID: 212s Be eed. SESS 4 
Hl 5.7 11.7 5.8 12.9 2.2 3.8 2.5 2.9 2.8 3.9 2.3 2.9 1.9 3.6 2.3 
2 $5.2 3.29 =229 2.2 =456 4:3° 21.9 1.5 =1.7 227 =1.0 <0:2 0:3°=<0.2 
3 =0.1 =7.7 2.3 -0:2 0.6 -24:3 0.5 0.3 QG 21.6 0.5. O13 O12 
4 S805 422: =0.1 =0.4 =0.2 122-023 =027 =0.3 12.7 O.1 O22 
5 -0.3: =156 0.5 -052 0.76 -0.9 0:7 =0.2 0.8 1.4 -0.2 
6 -1.9 1.6 0.6 1.4 0.9 0.4 -0.7 0.9 -1.4 1.2 
7 0.7 32202120 na 0a n 0.5 006 Lok 
8 =1.2 0.7 -0:3 0.5 =0.6 0.7 =0.8 =0:.5 
9 Ged =220: 025 00,17. =1.3 =024.: Le 
10 =1.6 152 -0.8 709° -0.9 Let 
z1 0.6 0.7 0:7 0.2 1.1 
12 8 -0.4 -0.9 -1.2 
13 2 0.9 0.8 
14 0:3 -0:9 
IS 0.8 
GARCH (p,q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL 
PROBA CRIT MODELS FOUND 
0.200000 1.28 ( 1, 6) ( 0,12) 
0.100000 1.64 (1, 4) ( 0,12) 
0.050000 1.96 (2; 3) ( 0, 4) 
0.020000 2359 (2,2) (ar 2) ( 0, 4) 
0.010000 2.58 ( 2, 1) (2, 2) (0, 4) 
0.005000 2.81 (2; 1) (25 2) (0, 4) 
0.002000 3.09 ( 2, 1) ( 2, 2) (0, 4) 
0.001000 cree) (23 2) ( 2, 2) (0, 4) 
0.000100 S42 C252) ( 2, 2) (0, 4) 
0.000010 4.26 (2, 2) (2, 2) ( 4, 3) ( 0, 4) 


5.3 


5.4 


5.5 


5.6 


where (1;) is a strong white noise with unit variance and u4æ? < 1 with u4 = E nt. Derive 
the asymptotic distribution of n!/?7(1). 


(Asymptotic behavior of n! A(1) for an ARCH(1) process) 

For the ARCH(1) model of Exercise 5.2, derive the asymptotic distribution of n!/*6(1). What 
is the asymptotic variance of this statistic when œ = 0? Draw this asymptotic variance as a 
function of œ and conclude accordingly. 


(Asymptotic behavior of the SACRs of a GARCH(1, 1) process) 
For the GARCH(1, 1) model of Exercise 2.8, derive the asymptotic distribution of n!/*A(h), 
for h 4 0 fixed. 


(Moment of order 4 of a GARCH(1, 1) process) 
For the GARCH(1, 1) model of Exercise 2.8, compute E€;€;41€s€5+2. 


(Asymptotic covariance between the SACRs of a GARCH(1, 1) process) 
For the GARCH(1, 1) model of Exercise 2.8, compute 


Cov {n'/?A(1), n'A}. 
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5.7 


5.8 


5.9 


5.10 


5.11 
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(First five SACRs of a GARCH(1, 1) process) 
Evaluate numerically the asymptotic variance of the vector y/n s of the first five SACRs of 
the GARCH(1, 1) model defined by 


€&=o;m, n, lid N(O, 1) 
of = 140.37 | + 0.5507 |. 


(Generalized Bartlett formula for an MA(q)-ARCH(1) process) 
Suppose that X, follows an MA(q) of the form 


q 
X,=& — > biei, 
i=l 
where the error term is an ARCH(1) process 
6 =M, oF =wtaer,, m iid NO,1), a < 1/3. 


How is the generalized Bartlett formula (B.15) expressed for i = j >q? 


(Fisher information matrix for dynamic regression model) 

In the regression model Y, = Fo,(W;) + €, introduced on page 112, suppose that (€;) is a 
NO, og) white noise. Suppose also that the regularity conditions entailing (5.14) hold. Give 
an explicit form to the blocks of the matrix J, and consider the case where a? does not 
depend on @. 


(LM tests in a linear regression model) 
Consider the regression model 


Y = XB, + X26. +U, 


where Y = (Y1, ..., Yn) is the dependent vector variable, X; is ann x k; matrix of explicative 
variables with rank k; (i = 1, 2), and the vector U is a (0, o7J,,) error term. Derive the LM 
test of the hypothesis Ho : Bz = 0. Consider the case X| X2 = 0 and the general case. 


(Centered and uncentered R?) 
Consider the regression model 


Y, = BiXint---+ BeXe+e, tHl,...,n, 


where the €, are iid, centered, and have a variance o? >0. Let Y = (Y,..., Yp) be the 
vector of dependent variables, X = (X;;) the n x k matrix of explanatory variables, € = 
(€1,..-,€,)’ the vector of the error terms and 6 = (61, ..., Bx)’ the parameter vector. Let 
Py = X(X’X)~!X’ denote the orthogonal projection matrix on the vector subspace generated 
by the columns of X. 

The uncentered determination coefficient is defined by 


2 


2 _ 


= = PyY 5.29 
nc IYI? X ( ) 


and the (centered) determination coefficient is defined by 


A 2 
2_ l-el _ aoe 
R ae e=(l,..., 1y, y=-) Y (5.30) 


5.12 
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Let T denote a k x k invertible matrix, c a number different from 0 and d any number. 
Let Ÿ = cY + de and X = XT. Show that if y = 0 and if e belongs to the vector subspace 
generated by the columns of X, then R. defined by (5.29) is equal to the determination 


coefficient in the regression of Ÿ on the columns of X. 


(Identification of the DAX and the S&P 500) 
From the address http://fr.biz.yahoo.com//bourse/accueil.htm1 download the 
series of DAX and S&P 500 stock indices. Carry out a study similar to that of Section 5.5 
and deduce a selection of plausible models. 


Estimating ARCH Models 
by Least Squares 


The simplest estimation method for ARCH models is that of ordinary least squares (OLS). This 
estimation procedure has the advantage of being numerically simple, but has two drawbacks: (i) the 
OLS estimator is not efficient and is outperformed by methods based on the likelihood or on the 
quasi-likelihood that will be presented in the next chapters; (ii) in order to provide asymptotically 
normal estimators, the method requires moments of order 8 for the observed process. An extension 
of the OLS method, the feasible generalized least squares (FGLS) method, suppresses the first 
drawback and attenuates the second by providing estimators that are asymptotically as accurate 
as the quasi-maximum likelihood under the assumption that moments of order 4 exist. Note that 
the least-squares methods are of interest in practice because they provide initial estimators for the 
optimization procedure that is used in the quasi-maximum likelihood method. 

We begin with the unconstrained OLS and FGLS estimators. Then, in Section 6.3, we will see 
how to take into account positivity constraints on the parameters. 


6.1 Estimation of ARCH(qg) models by Ordinary 
Least Squares 


In this section, we consider the OLS estimator of the ARCH(q) model: 
Et = Orr, 
q 
of =a +) age; with a >0, ao >0, i=1,...,q, (6.1) 
i=l 
(n+) is an iid sequence, E(n,) = 0, Var(y,) = 1. 


The OLS method uses the AR representation on the squares of the observed process. No assumption 
is made on the law of nz. 
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The true value of the vector of the parameters is denoted by 0) = (@o, @o1,..-, doq)” and we 
denote by 0 a generic value of the parameter. 
From (6.1) we obtain the AR(q) representation 


q 
A = wo + 5 oie; + ür, (6.2) 


i=l 


where u; = ë a? = (n? I)o?. The sequence (u+, F+); constitutes a martingale difference when 
Ee? = o? < œ, denoting by F, the o-field generated by {€s : 5 < t}. 

Assume that we observe €),...,€,, a realization of length n of the process (e€;), and let 
€0, +++; €1—q be initial values. For instance, the initial values can be chosen equal to zero. Intro- 
ducing the vector 


Za =i; Ere Gag) , 


in view of (6.2) we obtain the system 


e=Z!_Ootu,  t=1,...,1, (6.3) 
which can be written as 
Y = Xb + U, 
with the n x q matrix 
2 2 1 
1 e e 441 Zo 
X = = 
2 2 1 
l Er apes €n—q Z1 
and the n x 1 vectors 
e? ui 
Y = : : U = 
e Un 


Assume that the matrix X’X is invertible, or equivalently that X has full column rank (we will 
see that this is always the case asymptotically, and thus for n large enough). The OLS estimator 
of ĉo follows: 


On := arg min ||¥ — Xol = (XX) XTY. (6.4) 


Under assumptions OLS1 and OLS? below the variance of u, exists and is constant. The OLS 
estimator of og = Var(u;) is 


ae 1 4 2 1 Lja, E 
ô e ll -hY he -0-F ad : 


Remark 6.1 (OLS estimator of a GARCH model) An OLS estimator can also be defined for 
a GARCH (p, q) model, but the estimator is not explicit, because e2 does not satisfy an AR model 
when p +Æ 0 (see Exercise 7.5). 

To establish the consistency of the OLS estimators of 09 and Ge we must consider the following 
assumptions. 
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OLS1: (€,) is the nonanticipative strictly stationary solution of model (6.1), and wo > 0. 
OLS2: Ect < +00. 
OLS3: P(n? = 1) £ 1. 


Explicit conditions for assumptions OLS1 and OLS2 were given in Chapter 2. Assumption 
OLS3 that the law of 7; is nondegenerate allows us to identify the parameters. The assumption 
also guarantees the invertibility of X'X for n large enough. 


Theorem 6.1 (Consistency of the OLS estimator of an ARCH model) Let (6,) be a sequence 
of estimators satisfying (6.4). Under assumptions OLSI—OLS3, almost surely 
A a2 


On > %, GF > og, as n —> œ. 


Proof. The proof consists of several steps. 


(i) We have seen (Theorem 2.4) that (€,), the unique nonanticipative stationary solution of the 
model, is ergodic. The process (Z+) is also ergodic because Z, is a measurable function of {€,_;,1 > 
0}. The ergodic theorem (see Theorem A.2) then entails that 


1 ni 1 . i T 
-xXx = - 2 ZZ! > E(Z;-1Z!_1),as., asn — oo. (6.5) 


The existence of the expectation is guaranteed by assumption OLS3. Note that the initial values 
are involved only in a fixed number of terms of the sum, and thus they do not matter for the 
asymptotic result. Similarly, we have 


L, ie 
-xX'U=- Yo Zim: — E(Z;_\u;), as. asn—-> oc. 
n n 

t=1 
(ii) The invertibility of the matrix EZ,_\Z!_, = EZ,Z; is shown by contradiction. Assume that 
there exists a nonzero vector c of R4+! such that c' E Z,Z\c = 0. Thus E{c' Z,}? = 0, and it follows 
that c’Z,; =0 a.s. Therefore, there exists a linear combination of the variables é?, hee e gti 
which is a.s. equal to a constant. Without loss of generality, one can assume that, in this linear 
combination, the coefficient of e? = n?o? is 1. Thus 7? is a.s. a measurable function of the variables 
€1-1,--+5 E€&—q- However, the solution being nonanticipative, ne is independent of these variables. 
This implies that n? is a.s. equal to a constant. This constant is necessarily equal to 1, but this 
leads to a contradiction with OLS3. Thus E(Z,—;Z/_,) is invertible. 


(iii) The innovation of e? being u; = €? — o? = e? — E (e? | F;_-1), we have the orthogonality 
relations 


E(u) = E(u,¢?_,) =...= Eure) =0 


that is 
E(Z;-\u;) = 0. 


(iv) Point (ii) shows that n~!X’X is a.s. invertible, for n large enough and that, almost surely, as 
n> ©, 


n X'X\! X'U = 
6, — 0 = (=) ie {E(Z;-1Z;_4)} E(Zr-1ur) = 0. 
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For the asymptotic normality of the OLS estimator, we need the following additional assumption. 


OLS4: E(e8) < +00. 


Consider the (q + 1) x (q + 1) matrices 
ASEZ, iZ), B = E(ofZ,-1Z'_,). 

The invertibility of A was established in the proof of Theorem 6.1, and the invertibility of B is 
shown by the same argument, noting that c’ of Z;-1 = 0 if and only if c’Z;-; = 0 because of >0 
a.s. The following result establishes the asymptotic normality of the OLS estimator. 

Let ky = Enf. 
Theorem 6.2 (Asymptotic normality of the OLS estimator) Under assumptions OLS1—OLS4, 

nÊ, — 0) É NO, (Kk, — 1)A7! BAD). 


Proof. In view of (6.3), we have 


Thus 


-1 
K 1 n 1 n 
Vin — 6) = (3322-24) {edz (6.6) 


Let A € RI+!, 240. The sequence (A’Z,_1u;, F;) is a square integrable ergodic stationary mar- 
tingale difference, with variance 


Var(A!Z;—1u;) = A E(Z1Zy_u?)a = NE {Z,1Z!_1(n? — 1)?of ba 
= (ky — 1)a’Ba. 
By the CLT (see Corollary A.1) we obtain that, for all à Æ 0, 
L5, £ ; 
— SON Ziu MO, (Ky — DABA). 
n 
t=1 


Using the Cramér—Wold device, it follows that 


1 n 
Beam 4, NO, (ky — 1)B). (6.7) 
t=1 


The conclusion follows from (6.5), (6.6) and (6.7). 
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Remark 6.2 (Estimation of the information matrices) Consistent estimators A and B of the 
matrices A and B are obtained by replacing the theoretical moments by their empirical counterparts, 


a 1 n R 1 n j 
A= n Èa iZi B= oe a i 


where oF = Za The fourth order moment of the process n; = €,/o; is also consistently esti- 
mated by fig = n7! ye (er / 6,)*. Finally, a consistent estimator of the asymptotic variance of the 
OLS estimator is defined by 


Varasin Ôn — O)} = (fa — ANPAM, 


Example 6.1 (ARCH(1)) When q = 1 the moment conditions OLS2 and OLS4 take the form 
Ky < 1 and gag, < 1 (see (2.54)). We have 


A= l Ee, B= Eo; Bove, 
Ee; Ee, }’ Eote?, Eotet ý 


wC + a1) 
(1 — Kah (l — aor) 


with 
WO 
Ee? = , Eef = k Eor = 
1 — &o1 


ys 


The other terms of the matrix B are obtained by expanding oF = (w+ apie? 1)? and calculating 
the moments of order 6 and 8 of eĉ. 

Table 6.1 shows, for different laws of the iid process, that the moment conditions OLS2 and 
OLS4 impose strong constraints on the parameter space. 

Table 6.2 displays numerical values of the asymptotic variance, for different values of a; and 
wo = 1, when n; follows the normal (0, 1). 

The asymptotic accuracy of 6, becomes very low near the boundary of the domain of existence 
of E eÈ, The OLS method can, however, be used for higher values of ao1, because the estimator 
remains consistent when oy; < 3~!/? = 0.577, and thus can provide initial values for an algorithm 
maximizing the likelihood. 


Table 6.1 Strict stationarity and moment conditions for the ARCH(1) model when n, follows 
the V(0, 1) distribution or the Student rf distribution (normalized in such a way that E ne = 1), 


Strict stationarity Ee? < 00 Eef < 00 Ee < 00 
Normal ao, < 3.562 ao) < 1 Qo, < 0.577 ao, < 0.312 
t3 ao, < 7.389 ao, < 1 no no 
t5 Qo, < 4.797 ao, < 1 ao, < 0.333 no 
to aol < 4.082 aol < 1 Aor < 0.488 aol < 0.143 


‘no’ means that the moment condition is not satisfied. 


Table 6.2 Asymptotic variance of the OLS estimator of an ARCH(1) model with wo = 1, when 
n: ~ NO, 1). 


a1 0.1 0.2 0.3 


A 3.98 —1.85 8.03 —5.26 151.0 —106.5 
Varas{/n (n — 60)} Ge a] o a] a- me 
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6.2 Estimation of ARCH(q) Models by Feasible 
Generalized Least Squares 


In a linear regression model when, conditionally on the exogenous variables, the errors are 
heteroscedastic, the FGLS estimator is asymptotically more accurate than the OLS estimator. Note 
that in (6.3) the errors u, are, conditionally on Z;—1, heteroscedastic with conditional variance 
Var(ur | Z1) = (ky — DoF. 

For all 6 = (w, a ,..., @q)’, let 


q 
oF (0)=o+ aic; and 2=diag(o;*@,),...,0,°n)). 


i=l 
The FGLS estimator is defined by 
6, = (X'QX)7 1 X' ÊY. 
Theorem 6.3 (Asymptotic properties of the FGLS estimator) Under assumptions OLSI- 


OLS3 and if ao; > 0, i= 1,...,4, 


6, > 9%, as, VnG,—%) NO, -DI 
where J = Ele? ZZ!) is positive definite. 


Proof. It can be shown that J is positive definite by the argument used in Theorem 6.1. 
We have 


-1 
z Ee UA 
6, = x tziz) (is ‘gt-14) 
t=1 t=1 
1 n =1 1 n 
= (23 ortezaz]) [E Xorezm +u] 


t=1 


-1 
Vw aa i a or 
= + (: 2 "dzz {23s ‘doz (6.8) 


A Taylor expansion around 6 yields, with o- = o? (80), 
—4 rÂ —4 —6p* ðo? *\ Â 
o; (On) =O, — 20, (0 lar )(@n — 00), (6.9) 


where 6* is between 6, and 6. Note that, for all 6, oi @) = Z,_1. It follows that 
- Sot OZ! = : FoZz 1- 2 5-804) Z,-121 1 x Zi (Ê, — %). 
n t=1 7 n t=1 g ý t=1 g 7 


The first term on the right-hand side of the equality converges a.s. to J by the ergodic theorem. 
The second term converges a.s. to 0 because the OLS estimator is consistent and 
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IA 


1 “ —6/9* 1 1 Â 
=D r OOZ- Zi x Z_1 On — 60) 


t=1 


be R 
( > Ilo? (0*)Zi-1 r) lôn — Poll 


t=1 


IA 


K\|6, = boll, 


for n large enough. The constant bound K is obtained by arguing that the components of 6,, and 
thus those of 6*, are strictly positive for n large enough (because Ê, — 6 a.s.). Thus, we have 
o, °(0*)e?_, < 1/0", for i=1,...,q, and finally ||o, 7(0*)Z;—1|| is bounded. We have shown 


t-1 


n -1 

1 —-4/h 1 -1 

2 o ÂZ Za] > J. (6.10) 
i=l 


For the term in braces in (6.8) we have 


1 n 
m >D o (6, Zrii: 
t=1 


IAO De > 
= = o; tZ iu — 2) o; 6(0*)Z iu x Zj On — 80) (6.11) 
n n 
t=1 t=1 
—> 0, as., 


by the previous arguments, noting that Fler Za) = 0 and 


2 n ; N 
n Vier (0*)Z iu; x Z;_1 (Gn — 40) 


t=1 


2 i 
= | = or OOZ- (60) (7 = 1) x Zin — 6) 


t=1 


6, — 6 >0, as. 


1 n 
ex (25-1 
t=1 


Thus, we have shown that 6, —> 6o, a.s. 
Using (6.11), (6.8) and (6.10), we have 


- 7 1 n 7 
vnn — o) = (J 14 Rn) |= 2o; “Z| 


2 ey ; > 
-ZOT + Ru) Y 9; OZ ris x Zi [vnn — 0), 


t=1 


where R, — 0, a.s. A new expansion around 69 gives 


o,; °(6*) = a, ° — 30, 80) Z)_ 1 (Gn — 9); (6.12) 
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where 6** is between 6* and 6p. It follows that 
Jn (On a 6o) 


1 n 
= (JT! + Ry) y Xo? Zan; — 1) 
t=1 


2 7 n 7 , . 
-2T + Rn) 9; AZ — D x Zii vnÂ, — 60) 


tal 


6 : $ Â 1 
+a (J) + Rn) XO or 8) Ziu x {Z)_1/nGn — 60)} x {Zj_1/n(6* — O0)} 
t=1 
= Shi + Sno + Sn. (6.13) 


The CLT applied to the ergodic and square integrable stationary martingale difference 
o’ Zin? — 1) shows that S„ı converges in distribution to a Gaussian vector with zero mean 
and variance 


JI Blo (n? = 17 Z1 Zl}! = (en — Ds 


(see Corollary A.1). Moreover, 


I< , ; 
F Soo Za — 1) x Zl vnn — 9) 
n Fai 
1 n = r . 
= > Yoo, Zim — 1) J/n(@n — wo) 
t=1 


1 n 
‘a » | - Yo; * Zine Dej | Vn(Gnj — æj). 


j=l t=1 


The two terms in braces tend to 0 a.s. by the ergodic theorem. Moreover, the terms ./n(@, — @0) 
and y/n (@nj — œ&oj) are bounded in probability, as well as J =! + R,. It follows that S„2 tends to 
0 in probability. Finally, by arguments already used and because 0* is between 6, and Oo, 


K . 1 n 5 
JER, 6, — 9%) ||? — 2 Nee 0, 
Ta llJ T + Ralli vÊ, — e)l nl | 


Snail a 
n 


in probability. Using (6.12), we have shown the convergence in law of the theorem. 


The moment condition required for the asymptotic normality of the FGLS estimator is 
E(ef) < ov. For the OLS estimator we had the more restrictive condition Ee® < oo. Moreover, 
when this eighth-order moment exists, the following result shows that the OLS estimator is 
asymptotically less accurate than the FGLS estimator. 


Theorem 6.4 (Asymptotic OLS versus FGLS variances) Under assumptions OLS1—OLS4, 
the matrix 


A TBAT =J! 


is positive semi-definite. 
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Proof. Let D =o; PAL E 2J-'Z,_). Then 
E(DD') =A‘ E(efZ.12._)A b+ IO EG Z.AZz pr 
=A EZ aZ at Ht Oe AT 
=A 'BA!-— J"! 


is positive semi-definite, and the result follows. 


We will see in Chapter 7 that the asymptotic variance of the FGLS estimator coincides with that 
of the quasi-maximum likelihood estimator (but the asymptotic normality of the latter is obtained 
without moment conditions). This result explains why quasi-maximum likelihood is preferred to 
OLS (and even to FGLS) for the estimation of ARCH (and GARCH) models. Note, however, that 
the OLS estimator often provides a good initial value for the optimization algorithm required for 
the quasi-maximum likelihood method. 


6.3 Estimation by Constrained Ordinary Least Squares 


Negative components are not precluded in the OLS estimator 6, defined by (6.4) (see Exercise 6.3). 
When the estimate has negative components, predictions of the volatility can be negative. In order 
to avoid this problem, we consider the constrained OLS estimator defined by 


A 1 
6, =arg min Q,(6), QO, (0) = -IY — X012. 
n 


BelO, o0)1+! 
The existence of ĝe is guaranteed by the continuity of the function Q,, and the fact that 
{nQ,(6)}'/" > IXO — IY | > œ 


as ||O|| — co and 8 > 0, whenever X has nonzero columns. Note that the latter condition is satisfied 
at least for n large enough (see Exercise 6.5). 


6.3.1 Properties of the Constrained OLS Estimator 


The following theorem gives a condition for equality between the constrained and unconstrained 
estimators. The theorem is stated in the ARCH case but is true in a much more general framework. 


Theorem 6.5 (Equality between constrained and unconstrained OLS) Jf X is of rank q +1, 
the constrained and unconstrained estimators coincide, ĝe = 6,, if and only if 6, € [0, +oo)it!, 


Proof. Since 6, and ĝe are obtained by minimizing the same function QnC), and since ĝe 
minimizes this function on a smaller set, we have Q, (Ê) < Qn (6°). Moreover, 0s € [0, tooit, 
and we have Q,(@) > Oy (6), for all 6 € [0, +oo)tt1, 

Suppose that the unconstrained estimation 6, belongs to [0, +00)?*!. In this case Q, (6,) = 
Qn (ÔC). Because the unconstrained solution is unique, 6° = 6. 

The converse is trivial. 


We now give a way to obtain the constrained estimator from the unconstrained estimator. 
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Theorem 6.6 (Constrained OLS as a projection of OLS) Jf X has rank q + 1, the constrained 
estimator ĝe is the orthogonal projection of 6, on [0, +00)1*! with respect to the metric X'X, that is, 


6 =arg min (6, —0)'X'X(@, — 0). (6.14) 
6€[0,+00)9+1 


Proof. If we denote by P the orthogonal projector on the columns of X, and M = I, — P, 
we have 


nQ() = |Y — XO|? = ||P(Y — X0)? + |MY — X06) |!" 


= |X, — 0)? + MY17, 


using properties of projections, Pythagoras’s theorem and PY = X6,. The constrained estimation 
ĝe thus solves (6.14). Note that, since X has full column rank, a norm is well defined by 
lxllx/x =v x'X'Xx. The characterization (6.14) is equivalent to 


OS e [0, +00), Iô — Alex < On —Ollxrx, VO e [0, +00)". (6.15) 


Since [0, +00)1+! is convex, ĝe exists, is unique and is the X’X-orthogonal projection of 6, on 
[0, +00)4+!. This projection is characterized by 


6° e [0, +00)! and (6, —6°,6¢-6),,, > 0, VO € [0, +00)1t! (6.16) 
(see Exercise 6.9). This characterization shows that, when 6, ¢ [0, +oo)’t!, the constrained esti- 
mation ĝe must lie at the boundary of [0, +o0)1+!, Otherwise it suffices to take 6 € [0, +00)1+! 
between ĝe and 6, to obtain a scalar product equal to —1. 


The characterization (6.15) allows us to easily obtain the strong consistency of the constrained 
estimator. 


Theorem 6.7 (Consistency of the constrained OLS estimator) Under the assumptions of 
Theorem 6.1, almost surely, 


6° > 6 asn —> oo. 
Proof. Since 0 € [0, +00)1t!, in view of (6.15) we have 
On — OC lx xn < lôr — boll xxn- 
It follows that, using the triangle inequality, 
\|0¢ — Ollxx/n < \|0¢ = nll xxn + 116, — bollx’x/n < 2llĝ, — boll x’ x/n- 


Since, in view of Theorem 6.1, 6,, —> 69 a.s. and X'X/n converges a.s. to a positive def- 
inite matrix, it follows that ||@, — Oollx’'x/n —> 0 and thus that ||, — O5 llx'x/n > 0 a.s. Using 
Exercise 6.12, the conclusion follows. 
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6.3.2 Computation of the Constrained OLS Estimator 


We now give an explicit way to obtain the constrained estimator. We have already seen that if 
all the components of the unconstrained estimator 6, are positive, we have 0f = 6,. Now suppose 
that one component of @,, is negative, for instance the last one. Let 


1 e e 
E & 
X= (xX x”) xO = | Sata , 
i 2 2, 
1 €n-1 €n—g+l 
and 
=] 
P z A) > gj) ay o) ay 
a =(x'xy xr=( A | a= (% j= (oxo) xor), 
Qq (0) 0 


Note that 6 Æ 6) in general (see Exercise 6.11). 


Theorem 6.8 (Explicit form of the constrained estimator) Assume that X has rank q + 1 and 
&q < 0. Then 
6 e [0, +0) => & =Â. 


1 =l $ 
Proof. Let P® = x (z= xe) X(D be the projector on the columns of X® and let M® = 
I — P™®, We have 


5 1 bya)? xo” 1 
ÖL X' = aad Oa 6 ’) 0)( Say |=rP™, 


Õ,X'X = (Y'X®, Y'POX®), 
Ô, X'X = Y'X = (Y'X®, Y'X®), 
(6, — 6!) X'X = (0, Y'M x), 
Because Ô egyi <0, with eq+1 = (0,...,0, 1), we have (ô, — 8 )eg+1 <0. This can be 


written as 
(ô, ~ 6;,) X'X Ca eq+1 < 0, 


or alternatively 
VY MORN) aga < 0. 


Thus Y'’M® X®) < 0. It follows that for all 6 = (0, 6 such that 92 € [0, 00), 


6, Ons On 0 x Ê, ĝ, XX 6, — 8 
X'X 


A (1) (1) 
= ry) y2))\( n — 9 
= (0, Y'M® X ( g0 ) 


= =—99 yy MOxX™ > 0. 


In view of (6.16), we have 6° = 6, because 6, € [0, +00)1*!. 
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6.4 Bibliographical Notes 


The OLS method was proposed by Engle (1982) for ARCH models. The asymptotic properties of 
the OLS estimator were established by Weiss (1984, 1986), in the ARMA-GARCH framework, 
under eighth-order moments assumptions. Pantula (1989) also studied the asymptotic properties of 
the OLS method in the AR(1)-ARCH(q) case, and he gave an explicit form for the asymptotic 
variance. The FGLS method was developed, in the ARCH case, by Bose and Mukherjee (2003) 
(see also Gouriéroux, 1997). The convexity results used for the study of the constrained estimator 
can be found, for instance, in Moulin and Fogelman-Soulié (1979). 


6.5 Exercises 


6.1 


6.2 


6.3 


6.4 


6.5 


6.6 


(Estimating the ARCH(q) for q = 1,2,...) 
Describe how to use the Durbin algorithm (B.7)—(B.9) to estimate an ARCH(g) model 
by OLS. 


(Explicit expression for the OLS estimator of an ARCH process) 
With the notation of Section 6.1, show that, when X has rank q, the estimator 
6 = (X'X)!X’Y is the unique solution of the minimization problem 


deRI+1 


Ê =arg min Jule = z! 0%. 
i=l 


(OLS estimator with negative values) 
Give a numerical example (with, for instance, n = 2) showing that the unconstrained OLS 
estimator of the ARCH(q) parameters (with, for instance, q = 1) can take negative values. 


(Unconstrained and constrained OLS estimator of an ARCH(2) process) 
Consider the ARCH(2) model 


Et = Otr 
2_ 2 2 
Of =o+ ayer, + O26; a 


Let 6 = (Ô, @, G2)’ be the unconstrained OLS estimator of 0 = (w, a, a)’. Is it possible 
to have 

1. & <0? 

2. & < 0 and â < 0? 

3. ô < 0, @ < 0 and @ < 0? 


Let 6° = (6, &{, @5)’ be the OLS constrained estimator with âf > 0 and â$ > 0. Consider 
the following numerical example with n = 3 observations and two initial values: eĉ] = 0, 
e? =I, e? =0, e = 1/2, e = 1/2. Compute 6 and 6° for these observations. 


(The columns of the matrix X are nonzero) 
Show that if wo > 0, the matrix X cannot have a column equal to zero for n large enough. 


(Estimating an AR(1) with ARCH(q) errors) 
Consider the model 


Xt = GoX1-1 + €r, lol < 1, 


6.7 


6.8 


6.9 


6.10 


6.11 
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where (e+) is the strictly stationary solution of model (6.1) under the condition Æ Gi < ©. 
Show that the OLS estimator of ¢ is consistent and asymptotically normal. Is the assumption 
Eef < oo necessary in the case of iid errors? 


(Inversion of a block matrix) 


For a matrix partitioned as A = | fe re | show that the inverse (when it exists) is of 
21 22 
the form i j i i 
Ay + Ay Aj. FAA), —Aj, AjF 
A! = ; 
—FAn Aq, F 


where F = (Ax — An Aj] Ai) 1. 
(Does the OLS asymptotic variance depend on wo?) 
1. Show that for an ARCH(q) model E(€?"") is proportional to œ (when it exists). 


2. Using Exercise 6.7, show that, for an ARCH(q) model, the asymptotic variance of the 
OLS estimator of the ap; does not depend on wp. 


3. Show that the asymptotic variance of the OLS estimator of wo is proportional to wp 


(Properties of the projections on closed convex sets) 
Let E be an Hilbert space, with a scalar product (-,-) and a norm ||- ||. When C C E and 
x € E, it is said that x* € C is a best approximation of x on C if ||x — x*|| = minyec ||x — yll. 


1. Show that if C is closed and convex, x* exists and is unique. This point is then called the 
projection of x on C. 


2. Show that x* satisfies the so-called variational inequalities: 
VyecC, (x*—x,x*-y) <0. (6.17) 


and prove that x* is the unique point of C satisfying these inequalities. 


(Properties of the projections on closed convex cones) 
Recall that a subset K of the vectorial space E is a cone if, for all x € K, and for all A > 0, 
we have Ax € K. Let K be a closed convex cone of the Hilbert space E. 


1. Show that the projection x* of x on K (see Exercise 6.9) is characterized by 


| maer) (6.18) 


(eaa < 
2. Show that x* satisfies 
(a) Yx € E,VA > 0, (Ax)* = àx*. 
(b) Yx € E, |x||? = |lx*||? + lx — x*|?, thus |[x*|] < lx. 


(OLS estimation of a subvector of parameters) 

Consider the linear model Y = X0 + U with the usual assumptions. Let M2 be the matrix 
of the orthogonal projection on the orthogonal subspace of X®, where X = (X®, x), 
Show that the OLS estimator of 6“ (where 0 = (6, 62)’, with obvious notation) is 


A i =l 1 
ôP = (XO MXM) XO Mpy, 
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6.12 (A matrix result used in the proof of Theorem 6.7) 
Let (J„) be a sequence of symmetric k x k matrices converging to a positive definite matrix 
J. Let (X„) be a sequence of vectors in RÝ such that X’ JnXn — 0. Show that X, > 0. 


6.13 (Example of constrained estimator calculus) 
Take the example of Exercise 6.3 and compute the constrained estimator. 


Estimating GARCH Models 
by Quasi-Maximum Likelihood 


The quasi-maximum likelihood (QML) method is particularly relevant for GARCH models because 
it provides consistent and asymptotically normal estimators for strictly stationary GARCH pro- 
cesses under mild regularity conditions, but with no moment assumptions on the observed process. 
By contrast, the least-squares methods of the previous chapter require moments of order 4 at least. 

In this chapter, we study in details the conditional QML method (conditional on initial values). 
We first consider the case when the observed process is pure GARCH. We present an iterative 
procedure for computing the Gaussian log-likelihood, conditionally on fixed or random initial 
values. The likelihood is written as if the law of the variables 7, were Gaussian M(0, 1) (we refer 
to pseudo- or quasi-likelihood), but this assumption is not necessary for the strong consistency 
of the estimator. In the second part of the chapter, we will study the application of the method 
to the estimation of ARMA-GARCH models. The asymptotic properties of the quasi-maximum 
likelihood estimator (QMLE) are established at the end of the chapter. 


7.1 Conditional Quasi-Likelihood 


Assume that the observations €),..., €» constitute a realization (of length n) of a GARCH(p, q) 
process, more precisely a nonanticipative strictly stationary solution of 


€ =y hin 
q P (7.1) 
h; = 0 + D> aoe; +) bojh-j, Vt eZ, : 
i=l j=l 
where (n+) is a sequence of iid variables of variance 1, wo > 0, œo; > 0 (i = 1,...,¢q), and Bo; > 0 
GS Tess p). The orders p and q are assumed known. The vector of the parameters 
nC ee On4g41) = (@, @,..., Qq, Bi, heey Bp)’ (7.2) 


GARCH Models: Structure, Statistical Inference and Financial Applications Christian Francq and Jean-Michel Zakotan 
© 2010 John Wiley & Sons, Ltd 


142 GARCH MODELS 


belongs to a parameter space of the form 
© c (0, +00) x [0, o0)? +1. (7.3) 
The true value of the parameter is unknown, and is denoted by 


bo = (w0, 01, - - - , 0g; Bos -- -> Bop)’: 


To write the likelihood of the model, a distribution must be specified for the iid variables nz. 
Here we do not make any assumption on the distribution of these variables, but we work with 
a function, called the (Gaussian) quasi-likelihood, which, conditionally on some initial values, 
coincides with the likelihood when the n; are distributed as standard Gaussian. Given initial val- 
ues €, .. . , El—q> 5. ued G7 p to be specified below, the conditional Gaussian quasi-likelihood is 
given by 


La (0) = La (0; GERERE En) = Il 


1 2 
exp (- =) 
tel J 2067 26? 
where the a? are recursively defined, for t > 1, by 
q Pp 
67 =67(0) =0+) aici t) põ (1.4) 
i=l j=l 


For a given value of 0, under the second-order stationarity assumption, the unconditional variance 
(corresponding to this value of 0) is a reasonable choice for the unknown initial values: 


00 nae Oi ñ See (7.5) 
l 1- } iai -j Êj 


Such initial values are, however, not suitable for IGARCH models, in particular, and more generally 
when the second-order stationarity is not imposed. Indeed, the constant (7.5) would then take 
negative values for some values of 0. In such a case, suitable initial values are 


~ ~2 
g=... =e? 62 =... = 6? w (7.6) 


or 


2 2 x2 =2 2 
Ej =: = éj 6 =... = õi GE (7.7) 


A QMLE of 9 is defined as any measurable solution 6, of 


6, = arg max L, (0). 
OEO 


Taking the logarithm, it is seen that maximizing the likelihood is equivalent to minimizing, with 
respect to 0, 


1,(0) =n"! Di, where @, = 0,(0) = — + logõ? (7.8) 
t=1 
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and 6? is defined by (7.4). A QMLE is thus a measurable solution of the equation 


6, = arg min1, (0). (7.9) 
(Xt) 


It will be shown that the choice of the initial values is unimportant for the asymptotic properties 
of the QMLE. However, in practice this choice may be important. Note that other methods are 
possible for generating the sequence 6; for example, by taking 6? = co(0) + ee Ci (0)€2_,, 
where the c;(@) are recursively computed (see Berkes, Horvath and Kokoszka, 2003b). Note that 
for computing I,,(9), this procedure involves a number of operations of order n?, whereas the 
one we propose involves a number of order n. It will be convenient to approximate the sequence 
(€,(@)) by an ergodic stationary sequence. Assuming that the roots of Bg(z) are outside the unit 
disk, the nonanticipative and ergodic strictly stationary sequence (07) f= { o? (0)} , iS defined as 
the solution of 


q P 
of =0+9 one? +>) Bjo7j, Yt. (7.10) 
i=l j=l 


Note that o? (00) = hr. 


Likelihood Equations 


Likelihood equations are obtained by canceling the derivative of the criterion I,,(9) with respect 
to 6, which gives 


D 2 _ 24 Oe (7.11) 
= Eel —67}— =. ‘ 
n i i a; 30 


These equations can be interpreted as orthogonality relations, for large n. Indeed, as will be seen 
in the next section, the left-hand side of equation (7.11) has the same asymptotic behavior as 


the impact of the initial values vanishing as n —> oo. 
The innovation of e? is v, = €? — h?. Thus, under the assumption that the expectation exists, 


we have 
1 d07(4) 
Eo Vy 7 t ( = 0, 
of (0o) 30 
PI 
because aa is a measurable function of the €,_;, i > 0. This result can be viewed as the 
t 0 


asymptotic version of (7.11) at 6, using the ergodic theorem. 


7.1.1 Asymptotic Properties of the QMLE 


In this chapter, we will use the matrix norm defined by ||A|] = >> |a;;| for all matrices A = (a;j). 
The spectral radius of a square matrix A is denoted by (A). 
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Strong Consistency 


Recall that model (7.1) admits a strictly stationary solution if and only if the sequence of matrices 
Ao = (Aor), where 


a1? oy ogn? Boin? os Bopn? 
1 QO. 0 0 a 0 
(0) 1 0 0 tee (0) 
(0) 1 0 0 (0) 0 
Aor = aot Spi doq boi ve Bop ; 
0 0 1 (0) 0 
(0) 0 (0) 1 (0) 
(0) (0) 0 0 1 0 


admits a strictly negative top Lyapunov exponent, y (Ao) < 0, where 
: 1 
y (Ao) := inf — E(log ||Ao;Aor-1--- Aoi ll) 
teN* t 


. 1 
= lim as. — log || Aor Aor—1 pea Aol ||. (7.12) 
t>0o $ 


Let ` , 
A(z) = Yaz and = Bo(z) =1-— Yp. 
i=1 


j=l 


By convention, Ag(z) = 0 if q =0 and Bo(z) = 1 if p =0. To show strong consistency, the 
following assumptions are used. 


A1: 6) € © and © is compact. 

A2: y(Ao) < 0 and for all 6 € ©, ae Bj <1. 

A3: n2 has a nondegenerate distribution and E n? =l; 

A4: If p >0, Aa (z) and Ba, (z) have no common roots, Ag, (1) 4 0, and aog + Bop # O. 


Note that, by Corollary 2.2, the second part of assumption A2 implies that the roots of By (z) 


are outside the unit disk. Thus, a nonanticipative and ergodic strictly stationary sequence (o?) _ is 


defined by (7.10). Similarly, define 


n 2 
E 
n) =hOs Ens éni) =n Y tn & = LO) = + logor. 
t 


t=1 


Example 7.1 (Parameter space of a GARCH(1, 1) process) In the case of a GARCH(1, 1) 
process, assumptions Al and A2 hold true when, for instance, the parameter space is of the form 


© = [6, 1/8] x [0, 1/5] x [0, 1 — ô], 


where 6 € (0,1) is a constant, small enough so that the true value 69 = (wo, ao, Bo)’ belongs 
to ©. Figure 7.1 displays, in the plane (a, £), the zones of strict stationarity (when n, is M(0, 1) 


GARCH QMLE 145 


Bl 
an 


a 


1 2e’=3.56 © 


Figure 7.1 GARCH(1, 1): zones of strict and second-order stationarity and parameter space 
© = [w, ©] x [0, @] x [0, £]. 


distributed) and of second-order stationarity, as well as an example of a parameter space © (the 
gray zone) compatible with assumptions Al and A2. 


The first result states the strong consistency of 6,. The proof of this theorem, and of the next ones, 
is given in Section 7.4. 


Theorem 7.1 (Strong consistency of the QMLE) Let (6,) be a sequence of QMLEs satisfying 
(7.9), with initial conditions (7.6) or (7.7). Under assumptions Al—A4, almost surely 


A 


6, > b, asn —> œ. 


Remark 7.1 


1. It is not assumed that the true value of the parameter 6o belongs to the interior of ©. Thus, 
the theorem allows to handle cases where some coefficients, œ; or 6j, are null. 


2. It is important to note that the strict stationarity condition is only assumed at 69, not 
over all ©. In view of Corollary 2.2, the condition Lia Êj < 1 is weaker than the strict 
stationarity condition. 


3. Assumption A4 disappears in the ARCH case. In the general case, this assumption allows 
for an overidentification of either of the two orders, p or q, but not of both. We then 
consistently estimate the parameters of aGARCH(p — 1, q) (or GARCH(p, q — 1)) process 
if an overparameterized GARCH(p, q) model is used. 


4. When p Æ 0, assumption A4 precludes the case where all the a; are zero. In such a case, 
the strictly stationary solution of the model is the strong white noise, which can be written 
in multiple forms. For instance, a strong white noise of variance | can be written in the 
GARCH(I, 1) form with o? = o?(1— B) +0 x €? | + Bo? ,. 


5. The assumption of absence of a common root, in A4, is restrictive only if p>1 and 
q> 1. Indeed if q = 1, the unique root of A,,(z) is O and we have Be (0) 4 0. If p = 
1 and Bo; # 0, the unique root of Be (z) is 1/601 > 0 (if Bo; = 0, the polynomial does 
not admit any root). Because the coefficients ag; are positive this value cannot be a zero 


of Ag (z). 


6. The assumption En; = 0 is not required for the consistency (and asymptotic normality) 
of the QMLE of a GARCH. The conditional variance of e€, is thus, in general, only pro- 
portional to h;: Var(e; | €u, u < t) = {1 — (Em)? }hy. The assumption En? = | is made for 
identifiability reasons and is not restrictive provided that Æ n < 0. 
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Asymptotic Normality 


The following additional assumptions are considered. 


A5: 09 € O, where O denotes the interior of ©. 
A6: Ky = En} < ©. 


The limiting distribution of 6, is given by the following result. 
Theorem 7.2 (Asymptotic normality of the QMLE) Under assumptions A1-A6, 


Va, — 6) E NO, (ky — DIT}, 


32L, (00) 1 807(69) 3o? (00) 
J := Ew | ——— ) = Ew | =—— 
9000! a (4)) 30 a0’ 


is a positive definite matrix. 


where 


Remark 7.2 


(7.13) 


1. Assumption A5 is standard and entails the first-order condition (at least asymptotically). 


Indeed if 6, is consistent, it also belongs to the interior of ©, for large n. At this maximum 
the derivative of the objective function cancels. However, assumption AS is restrictive 
because it precludes, for instance, the case ag; = 0. 


. When one or several components of 9 are null, assumption AS is not satisfied and 
the theorem cannot be used. It is clear that, in this case, the asymptotic distribution of 
vn Ôn — o) cannot be normal because the estimator is constrained. If, for instance, 
ao, = 0, the distribution of ./n(@, — a1) is concentrated in [0, 00), for all n, and thus 
cannot be asymptotically normal. This kind of ‘boundary’ problem is the object of a 
specific study in Chapter 8. 


. Assumption A6 does not concern é?, and does not preclude the IGARCH case. Only a 
fourth-order moment assumption on 7; is required. This assumption is clearly necessary for 
the existence of the variance of the score vector 00;(69)/00@. In the proof of this theorem, 
it is shown that 


Eo {ao} =0, Vata {ao} = (ky — DJ. 


. In the ARCH case (p = 0), the asymptotic variance of the QMLE reduces to that of 
the FGLS estimator (see Theorem 6.3). Indeed, in this case we have 3o? (0)/ 00 = Z;_}. 
Theorem 6.3 requires, however, the existence of a fourth-order moment for the observed pro- 
cess, whereas there is no moment assumption for the asymptotic normality of the QMLE. 
Moreover, Theorem 6.4 shows that the QMLE of an ARCH(q) is asymptotically more 
accurate than that of the OLS estimator. 
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7.1.2 The ARCH(1) Case: Numerical Evaluation 
of the Asymptotic Variance 


Consider the ARCH(1) model 
€ = {wo + oe 1} nr, 


with wo >0 and ap > 0, and suppose that the variables 7; satisfy assumption A3. The parameter 
is 0 = (w, a)’. In view of (2.10), the strict stationarity constraint A2 is written as 


ao < exp{—E (log n?)}. 


Assumption Al holds true if, for instance, the parameter space is of the form © = [6, 1/5] x 
[0, 1/6], where ô >0 is a constant, chosen sufficiently small so that 0) = (wo, a)’ belongs to ©. 
By Theorem 7.1, the QMLE of 8 is then strongly consistent. Since 067 /00 = (1, a the QMLE 
6, = (Ôn, Ân) is characterized by the normal equation 


Jaaa ( 1 ) ‘ 

= Er E = 

n t=1 (Ôn + âne? 1)? €- 

with, for instance, e = e, This estimator does not have an explicit form and must be obtained 

numerically. Theorem 7.2, which provides the asymptotic distribution of the estimator, only requires 
oO 

the extra assumption that 6) belongs to @ = (6, 1/5) x (0, 1/8). Thus, if ag = O (that is, if the 

model is conditionally homoscedastic), the estimator remains consistent but is no longer asymp- 

totically normal. Matrix J takes the form 


2 
1 E 
Vandy ee ND: adana AD) 
(wota) (wp+age;_ 1 )- 
J = Eg 2 > 
0 € € 
tl t-l 
7 7: 
(wo-+age;_4)- (wota)? 


and the asymptotic variance of ./n (0, — Op) is 
Varas {Vn Gn — 60)} = (ky — DI. 


Table 7.1 displays numerical evaluations of this matrix. An estimation of J is obtained by replacing 
the expectations by empirical means, obtained from simulations of length 10 000, when n, is V(O, 1) 
distributed. This experiment is repeated 1000 times to obtain the results presented in the table. 

In order to assess, in finite samples, the quality of the asymptotic approximation of the variance 
of the estimator, the following Monte Carlo experiment is conducted. For the value 69 of the 
parameter, and for a given length n, N samples are simulated, leading to N estimations 6 of 


Table 7.1 Asymptotic variance for the QMLE of an ARCH(1) process with n; ~ M(0, 1). 


wo = 1, ao =0.1 wo = I; ao = 0.5 Wo = 1, ao = 0.95 


A 3.46 —1.34 4.85 —2.15 6.61 —2.83 
Varas {n (Pn — 0)} Ge a e = ee ae) 
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Table 7.2 Comparison of the empirical and theoretical asymptotic variances, for the QMLE of 
the parameter ag = 0.9 of an ARCH(1), when n; ~ (0, 1). 


n Tn RMSE(a) {VataslY7(Gn — ao) I}? / a Pla, > 1] 
100 0.85221 0.25742 0.25014 0.266 
250 0.88336 0.16355 0.15820 0.239 
500 0.89266 0.10659 0.11186 0.152 
1000 0.89804 0.08143 0.07911 0.100 
6, i = 1,... N. We denote by 0, = (On, &y)’ their empirical mean. The root mean squared error 


(RMSE) of estimation of œ is denoted by 


2|- 


1/2 
(a - m| 
l 


N 
RMSE(a@) = | 


1 


and can be compared to { Vargas [Vn (ân — aro) 1} / i / /n, the latter quantity being evaluated inde- 
pendently, by simulation. A similar comparison can obviously be made for the parameter w. For 
4 = (0.2, 0.9)’ and N = 1000, Table 7.2 displays the results, for different sample length n. 

The similarity between columns 3 and 4 is quite satisfactory, even for moderate sample sizes. 
The last column gives the empirical probability (that is, the relative frequency within the N 
samples) that @, is greater than 1 (which is the limiting value for second-order stationarity). These 
results show that, even if the mean of the estimations is close to the true value for large n, the 
variability of the estimator remains high. Finally, note that the length n = 1000 remains realistic 
for financial series. 


7.1.3 The Nonstationary ARCH(1) 


When the strict stationarity constraint is not satisfied in the ARCH(1) case, that is, when 
2 
a > exp {—E log n? I, (7.14) 
one can define an ARCH(1) process starting with initial values. For a given value €o, we define 


6 =m, hy =w t=1,2,..., (7.15) 
where wo >0 and a >0, with the usual assumptions on the sequence (7;). As already noted, 


oa? converges to infinity almost surely when 
ay > exp{—E logn?}, (7.16) 


and only in probability when the inequality (7.14) is an equality (see Corollary 2.1 and Remark 2.3 
following it). Is it possible to estimate the coefficients of such a model? The answer is only partly 
positive: it is possible to consistently estimate the coefficient a, but the coefficient wọ cannot be 
consistently estimated. The practical impact of this result thus appears to be limited, but because 
of its theoretical interest, the problem of estimating coefficients of nonstationary models deserves 
attention. Consider the QMLE of an ARCH(1), that is to say a measurable solution of 


n 


1 e 
Dris On = in — £,(0 , £0) = i 
(Ôn, Ĝn) n a2 10), &0) 


o? (0) 


+ logo? (0), (7.17) 
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where 0 = (w, a), © is a compact set of (0, oo)”, and o? (0) =o+ ae? , fort = 1,...,n (starting 
with a given initial value for €2). The almost sure convergence of e? to infinity will be used to 
show the strong consistency of the QMLE of ap. The following lemma completes Corollary 2.1 
and gives the rate of convergence of é to infinity under (7.16). 


Lemma 7.1 Define the ARCH(1) model by (7.15) with any initial condition e? > 0. The nonsta- 
tionarity condition (7.16) is assumed. Then, almost surely, as n > œo, 


1 1 
m o(p") and z~ olp") 
for any constant p such that 
1>p>exp{—E logn} /ao. (7.18) 


This result entails the strong consistency and asymptotic normality of the QMLE of ao. 


Theorem 7.3 Consider the assumptions of Lemma 7.1 and the QMLE defined by (7.17) where 
4 = (wo, &o) € ©. Then 


an > A a.s., (7.19) 


and when 0o belongs to the interior of ©, 


Ji Gn — a0)  N{0, (Ky — 1)02} (7.20) 
as n = 00. 


In the proof of this theorem, it is shown that the score vector satisfies 


1 wa Ê 0 o0 
Fao a9) S NYO = —D(G male 


0 


In the standard statistical inference framework, the variance J of the score vector is (propor- 
tional to) the Fisher information. According to the usual interpretation, the form of the matrix 
J shows that, asymptotically and for almost all observations, the variations of the log-likelihood 
n! Fai log £,(0) are insignificant when 0 varies from (wọ, œo) to (wọ + h, œo) for small h. 
In other words, the limiting log-likelihood is flat at the point (wọ, œo) in the direction of variation 
of wo. Thus, minimizing this limiting function does not allow 6 to be found. This leads us to 
think that the QML of wo is likely to be inconsistent when the strict stationarity condition is not 
satisfied. Figure 7.2 displays numerical results illustrating the performance of the QMLE in finite 
samples. For different values of the parameters, 100 replications of the ARCH(1) model have been 
generated, for the sample sizes n = 200 and n = 4000. The top panels of the figure correspond 
to a second-order stationary ARCH(1), with parameter 6) = (1, 0.95). The panels in the middle 
correspond to a strictly stationary ARCH(1) of infinite variance, with 6) = (1, 1.5). The results 
obtained for these two cases are similar, confirming that second-order stationarity is not neces- 
sary for estimating an ARCH. The bottom panels, corresponding to the explosive ARCH(1) with 
parameter 0) = (1, 4), confirm the asymptotic results concerning the estimation of a. They also 
illustrate the failure of the QML to estimate wo under the nonstationarity assumption (7.16). The 
results even deteriorate when the sample size increases. 
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Figure 7.2 Box-plots of the QML estimation errors for the parameters wọ and ap of an ARCH(1) 
process, with n; ~ M(0, 1). 


7.2 Estimation of ARMA-GARCH Models 
by Quasi-Maximum Likelihood 


In this section, the previous results are extended to cover the situation where the GARCH process is 
not directly observed, but constitutes the innovation of an observed ARMA process. This framework 
is relevant because, even for financial series, it is restrictive to assume that the observed series is 
the realization of a noise. From a theoretical point of view, it will be seen that the extension to the 
ARMA-GARCH case is far from trivial. Assume that the observations X1, ..., Xn are generated 
by a strictly stationary nonanticipative solution of the ARMA(P, Q)-GARCH(p, q) model 


P Q 
X: — 09 = X ai (Xai — co) +e; — X dojer-j 


i=1 j=l 


ea, 7.21) 
q P 
hi = wo + D aoe; + > Bojht—j, 


i=l j=l 
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where (7;) and the coefficients wo, œo; and Bo; are defined as in (7.1). The orders P, Q, p,q are 
assumed known. The vector of the parameters is denoted by 


g = (8, 0Y = (c,a),...ap,b},...,b9, 0Y, 
where 0 is defined as previously (see (7.2)). The parameter space is 
P c RPH x (0, +00) x [0, 00)? +4, 
The true value of the parameter is denoted by 
po = (9%, 9)’ = (co, ao1,--- dor, bor, ---, bog, 9%)’: 


We still employ a Gaussian quasi-likelihood conditional on initial values. If q > Q, the initial 
values are 


= 2 
Xo, Skay X1—(q-0)-P, €_g+Q> ere) El—q; 9%, T s 0l-p' 


These values (the last p of which are positive) may depend on the parameter and/or on the 
observations. For any Ŷ, the values of & (V), for t = —q + Q + 1,...,n, and then, for any 0, the 
values of &7(9), fort = 1,...,, can thus be computed from 


P Q 
& =&(0) =X, —c— Yl a(Xmi — 0) +) bjj 
a j=l (7.22) 
6? =67(9) =o+ Done? + > Bð 
i=1 j=l 
When q < Q, the fixed initial values are 


-3 mr 
Xo, .. ., X1—(q-0)-P, €0, »++,€1-Q,9, sey O_p- 


Conditionally on these initial values, the Gaussian log-likelihood is given by 


Îi (p) =n ye l, = &(¢) = a) + log 67(g) 
n = ts t t 62(9) t : 


A QMLE is defined as a measurable solution of the equation 


Ên = arg minÎ, (9). 
pEb 


Strong Consistency 


Let ay(z) = 1 — Yi aizi and bẹ (z) = 1 — Sy b;z/. Standard assumptions are made on these 


AR and MA polynomials, and assumption A1 is modified as follows: 


AT: go € ® and ® is compact. 
A8: For all g € ®, ay(z)by(z) = 0 implies |z| > 1. 
A9: ap, (z) and bs, (z) have no common roots, aop Æ 0 or bog Æ 0. 
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Under assumptions A2 and A8, (X+) is supposed to be the unique strictly stationary nonanticipative 
solution of (7.21). Let e, = €;(0) = ap (B)b3' (B)(X, — c) and £, = L (9) = €?/02 + logo?, 
where oa? = a; (o) is the nonanticipative and ergodic strictly stationary solution of (7.10). Note 


that e; = €;(Vo) and h; = o? (po). The following result extends Theorem 7.1. 


Theorem 7.4 (Consistency of the QMLE) Let (Ôn) be a sequence of QMLEs satisfying (7.2). 
Assume that En; = 0. Then, under assumptions A2—A4 and A7—A9, almost surely 


Ên > 0, asn —> oo. 


Remark 7.3 


1. As in the pure GARCH case, the theorem does not impose a finite variance for e, (and 
thus for X,). In the pure ARMA case, where e, = n, admits a finite variance, this theorem 
reduces to a standard result concerning ARMA models with iid errors (see Brockwell and 
Davis, 1991, p. 384). 


2. Apart from the condition En, = 0, the conditions required for the strong consistency of the 
QMLE are not stronger than in the pure GARCH case. 


Asymptotic Normality When the Moment of Order 4 Exists 


So far, the asymptotic results of the QMLE (consistency and asymptotic normality in the pure 
GARCH case, consistency in the ARMA-GARCH case) have not required any moment assumption 
on the observed process (for the asymptotic normality in the pure GARCH case, a moment of 
order 4 is assumed for the iid process, not for €+). One might think that this will be the same for 
establishing the asymptotic normality in the ARMA-GARCH case. The following example shows 
that this is not the case. 


Example 7.2 (Nonexistence of J without moment assumption) Consider the AR(1)-ARCH(1) 
model 


X,=anX1te, e, =h, hy = w + oe; (7.23) 


where |aoi| < 1, œo >0, @ > 0, and the distribution of the iid sequence (7,) is defined, 


for a > 1, by 
1 1 
P a) = P a) 742’ P@ =0)=1-— a 


Then the process (X;) is always stationary, for any value of ap (because exp {-E (log 12} = +00; 
see the strict stationarity constraint (2.10)). By contrast, X, does not admit a moment of order 2 
when œo > | (see Theorem 2.2). The first component of the (normalized) score vector is 


8,60) _ (: z) ( 1 aa 1, 2er de (Oo) 


da, o? h, oda, h, ða 


lji Xi N Xt-1 
= —2a9 (1 — no (=) —2 A 
a, a Vi 
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We have 
2 
€+-1X1-2 rX1-1 
seo- (53) + 
o-a (SF Vic 
2 
€;-1X;-2 N X1-1 
> E| $ao(1—72)( — + | -1 = 0 | P(n- = 0 
| 0 ( )( ie ) Ji; Nt—1 Mt-1 ) 
agı 1 2 
= a (1 E =) E (X75) 
since, first, 7;-; = 0 entails ¢,_; = 0 and X;_| = do, X;~2, and second, n;—ı and X;—2 are inde- 


pendent. Consequently, if E xX? = © and ag; Æ 0, the score vector does not admit a variance. 


This example shows that it is not possible to extend the result of asymptotic normality obtained 
in the GARCH case to the ARMA-GARCH models without additional moment assumptions. This 
is not surprising because for ARMA models (which can be viewed as limits of ARMA-GARCH 
models when the coefficients a; and fo; tend to 0) the asymptotic normality of the QMLE is 
shown with second-order moment assumptions. For an ARMA with infinite variance innovations, 
the consistency of the estimators may be faster than in the standard case and the asymptotic 
distribution is stable, but non-Gaussian in general. We show the asymptotic normality with a 
moment assumption of order 4. Recall that, by Theorem 2.9, this assumption is equivalent to 
P{E(Ao; ® Aor)} < 1. We make the following assumptions: 


A10: p {E(Ag; ® Aor)} < 1 and, for all 6 € ©, ea Bj <1. 


All: go € È, where b denotes the interior of ®. 
A12: There exists no set A of cardinality 2 such that P(n, € A) = 1. 


Assumption A10 implies that «, = E (n$) < oo and makes assumption A2 superfluous. The 
identifiability assumption A12 is slightly stronger than the first part of assumption A3 when 
the distribution of 7; is not symmetric. We are now in a position to state conditions ensuring 
the asymptotic normality of the QMLE of an ARMA-GARCH model. 


Theorem 7.5 (Asymptotic normality of the QMLE) Assume that En; =0 and that assump- 
tions A3, A4 and AS—A12 hold true. Then 


Van — 0) > NO, E), 


where X = JIJ !, 


dL: (Po) eal =5 eS 
; = Ep ( — =). 


Tae 
w( ap ag’ apag’ 


If, in addition, the distribution of n, is symmetric, we have 


fh ® {h 0 
t=($ a) I=(0 2): 
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with 
h=( DE 1 bas dor, )) 448 1 de, Ber y 
= (Ky — — — — — —_ — 7 
a ~\ G4 ap a0 *° 0 \ G2 av aw *° 
pat DE 1 do? bof ) 
E a — — — — ; 
2 n po of a0 300’ Po 
a 1 ba? 807 | Jtg 2 der de | 
PSO ga op an ~\ 52 a9 an)” 
1 007 ðo? 
h= Ep | =—- — 
i Te 30 00" (w) 
Remark 7.4 


1. It is interesting to note that if n, has a symmetric law, then the asymptotic variance © is 
block-diagonal, which is interpreted as an asymptotic independence between the estimators 
of the ARMA coefficients and those of the GARCH coefficients. The asymptotic distribution 
of the estimators of the ARMA coefficients depends, however, on the GARCH coefficients 
(in view of the form of the matrices J} and J; involving the derivatives of of). On the 
other hand, still when the distribution of 7, is symmetric, the asymptotic accuracy of the 
estimation of the GARCH parameters is not affected by the ARMA part: the lower left 
block J, 'bJ5' of & depends only on the GARCH coefficients. The block-diagonal form 
of & may also be of interest for testing problems of joint assumptions on the ARMA and 
GARCH parameters. 


2. Assumption A11 imposes the strict positivity of the GARCH coefficients and it is easy to 
see that this assumption constrains only the GARCH coefficients. For any value of vo, the 
restriction of ® to its first P + Q + 1 coordinates can be chosen sufficiently large so that 
its interior contains Yo and assumption A8 is satisfied. 


3. In the proof of the theorem, the symmetry of the iid process distribution is used to show 
the following result, which is of independent interest. 


If the distribution of 7, is symmetric then, 
Vj, Efel, ei... ders f(G-j-1, @—j-2,-..)} =0, (7.24) 


provided this expectation exists (see Exercise 7.1). 


Example 7.3 (Numerical evaluation of the asymptotic variance) Consider the AR(1)- 
ARCH(1) model defined by (7.23). In the case where n; follows the A(O, 1) law, condition A10 
for the existence of a moment of order 4 is written as 3a, < 1, that is, a < 0.577 (see (2.54)). 
In the case where 7; follows the xd) distribution, normalized in such a way that En; = 0 and 
E n2 = |, this condition is written as 15a < 1, that is, aw < 0.258. To simplify the computation, 
assume that wp = 1 is known. Table 7.3 provides a numerical evaluation of the asymptotic 
variance X, for these two distributions and for different values of the parameters ap and qo. It 
is clear that the asymptotic variance of the two parameters strongly depends on the distribution 
of the iid process. These experiments confirm the independence of the asymptotic distributions of 
the AR and ARCH parameters in the case where the distribution of 7; is symmetric. They reveal 
that the independence does not hold when this assumption is relaxed. Note the strong impact 
of the ARCH coefficient on the asymptotic variance of the AR coefficient. On the other hand, 
the simulations confirm that in the case where the distribution is symmetric, the AR coefficient 
has no impact on the asymptotic accuracy of the ARCH coefficient. When the distribution is not 
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Table 7.3 Matrices & of asymptotic variance of the estimator of (do, a) for an AR(1)- 
ARCH(1), when wo = 1 is known and the distribution of n, is M(O, 1) or normalized x?(1). 


ap = (0) ap = 0.1 ao = 0.25 ap = 0.5 
dg = 0 
~ NO, 1) 1.00 0.00 1.14 0.00 1.20 0.00 1.08 0.00 
me ; 0.00 0.67 0.00 1.15 0.00 1.82 0.00 2.99 


oot 1.00 —0.54 1.70 —1.63 2.78 —1.51 
Mm ~ X —0.54 0.94 —1.63 8.01 —1.51 18.78 
ao = —0.5 


ie WEA) ion oe 


0.82 0.00 0.83 0.00 0.72 0.00 
0.00 0.67 0.00 1.15 0.00 1.82 0.00 2.99 


Sait 0.75 —0.40 1.04 —0.99 1.41 —0.78 7 
Mm ~ X —0.40 0.94 —0.99 8.02 —0.78 18.85 
dy = —0.9 


mn ~ NO, 1) (a aa) 


0.19 0.00 0.18 0.00 0.13 0.00 
0.00 0.67 0.00 1.15 0.00 1.82 0.00 2.98 


si 0.19 —0.10 0.20 —0.19 0.21 0:12 
Me™ X —0.10 0.94 —0.19 8.01 —0.12 18.90 


symmetric, the impact, if there is any, is very weak. For the computation of the expectations 
involved in the matrix X, see Exercise 7.8. In particular, the values corresponding to ao = 0 
(AR(1) without ARCH effect) can be analytically computed. Note also that the results obtained 
for the asymptotic variance of the estimator of the ARCH coefficient in the case ay = 0 do not 
coincide with those of Table 7.2. This is not surprising because in this table wo is not supposed 
to be known. 


7.3 Application to Real Data 


In this section, we employ the QML method to estimate GARCH(1, 1) models on daily returns of 
11 stock market indices, namely the CAC, DAX, DJA, DJI, DJT, DJU, FTSE, Nasdaq, Nikkei, 
SMI and S&P 500 indices. The observations cover the period from January 2, 1990 to January 22, 
2009! (except for those indices for which the first observation is after 1990). The GARCH(1, 1) 
model has been chosen because it constitutes the reference model, by far the most commonly used 
in empirical studies. However, in Chapter 8 we will see that it can be worth considering models 
with higher orders p and q. 

Table 7.4 displays the estimators of the parameters w,a, 6, together with their estimated 
standard deviations. The last column gives estimates of p4 = (a 4 B% + (E ng 1)a2, obtained 
by replacing the unknown parameters by their estimates and E ng by the empirical mean of the 
fourth-order moment of the standardized residuals. We have E ed < oo if and only if p4 < 1. The 


' For the Nasdaq an outlier has been eliminated because the base price was reset on the trading day 
following December 31, 1993. 
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Table 7.4 GARCH(1, 1) models estimated by QML for | 11 indices. The estimated standard 


deviations are given in parentheses. p4 = (@ + 8)? + (E nb 


Index w a B p4 

CAC 0.033 (0.009) 0.090 (0.014) 0.893 (0.015) 1.0067 
DAX 0.037 (0.014) 0.093 (0.023) 0.888 (0.024) 1.0622 
DJA 0.019 (0.005) 0.088 (0.014) 0.894 (0.014) 0.9981 
DJI 0.017 (0.004) 0.085 (0.013) 0.901 (0.013) 1.0020 
DJT 0.040 (0.013) 0.089 (0.016) 0.894 (0.018) 1.0183 
DJU 0.021 (0.005) 0.118 (0.016) 0.865 (0.014) 1.0152 
FTSE 0.013 (0.004) 0.091 (0.014) 0.899 (0.014) 1.0228 
Nasdaq 0.025 (0.006) 0.072 (0.009) 0.922 (0.009) 1.0021 
Nikkei 0.053 (0.012) 0.100 (0.013) 0.880 (0.014) 0.9985 
SMI 0.049 (0.014) 0.127 (0.028) 0.835 (0.029) 1.0672 
S&P 500 0.014 (0.004) 0.084 (0.012) 0.905 (0.012) 1.0072 


estimates of the GARCH coefficients are quite homogenous over all the series, and are similar to 
those usually obtained in empirical studies of daily returns. The coefficients œ are close to 0.1, 
and the coefficients 6 are close to 0.9, which indicates a strong persistence of the shocks on the 
volatility. The sum a + £ is greater than 0.98 for 10 of the 11 series, and greater than 0.96 for 
all the series. Since œ + 6 < 1, the assumption of second-order stationarity cannot be rejected, for 
any series (see Section 8.1). A fortiori, by Remark 2.6 the strict stationarity cannot be rejected. 
Note that the strict stationarity assumption, Æ log(ain? + 61) < 0, seems difficult to test directly 
because it not only relies on the GARCH coefficients but also involves the unknown distribution 
of n+. The existence of moments of order 4, E € < ©, is questionable for all the series because 
(â + ÊY + (E ng — 1)@? is extremely close to 1. Recall, however, that the asymptotic properties 
of the QML do not require any moment on the observed process but do require strict stationarity. 


7.4 Proofs of the Asymptotic Results* 


We denote by K and p generic constants whose values can vary from line to line. As an example, 
one can write for 0 < pı < 1 and 0 < p2 < 1, i; > 0, i2 > 0, 


O0<K Xo +K Xio < Kp™t i, 


izi i>in 


Proof of Theorem 7.1 
2 


The proof is based on a vectorial autoregressive representation of order 1 of the vector of = 
(07, oF jpeg ae a), analogous to that used for the study of stationarity. Assumption A2 allows 


us to write o? as a series depending on the infinite past of the variable e. It can be shown that 
the initial values are not important asymptotically, using the fact that, under the strict stationarity 
assumption, GA necessarily admits a moment order s, with s > 0. This property also allows us to 
show that the expectation of £;(0o) is well defined in R and that Eg,(¢;(@)) — Ea, (€+()) = 0, 
which guarantees that the limit criterion is minimized at the true value. The difficulty is that 
Eo (€; (@)) can be equal to +00. Assumptions A3 and A4 are crucial to establishing the identifia- 
bility: the former assumption precludes the existence of a constant linear combination of the e? jp 
j = 0. The assumption of absence of common root is also used. The ergodicity of ¢;(0) and a 
compactness argument conclude the proof. 
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It will be convenient to rewrite (7.10) in matrix form. We have 


a? =c,+ Bo? ,, (7.25) 
where 
q 
o w + Deg OE; Bi Bo Bp 
ae 0 1 0 > 0 
o? — i , G= , B= . . (7.26) 
Ogi 0 0 1 0 


We will establish the following intermediate results. 

(a) limpo SUPgce@ lh, (0) 7 I, (@)| = 0, a.s. 

(b) (at € Z such that o? (0) = of (60) Pa a.s.) => 0 = 6. 

(c) Eoyl€:(90)| < 00, and if 0 4 0o, Eol: (0) > Eol: (80). 

(d) For any 0 Æ 6o, there exists a neighborhood V (0) such that 


lim inf w h0 ) > Eo £1 (80), a.s. 


NOC ĝ*e 


(a) Asymptotic irrelevance of the initial values. In view of Corollary 2.2, the condition 
a Êj < 1 of assumption A2 implies that p(B) < 1. The compactness of © implies that 


sup e(B) < 1. (7.27) 
SS] 
Iterating (7.25), we thus obtain 
co 
a =, + Bo t Beat + Ble, + B'a =) Biga (7.28) 
k=0 


Let õ? be the vector obtained by replacing oF. ; by 5 -; in a? , and let č, be the vector obtained 
by ieplaciñe e PEAN e? å by the initial values (7. 6) or (7.7). We have 


53 
oO C 


t—q—1 
6 = Gr BG 4 eee B 


| pig RE B! } B'õĝ. (7.29) 


From (7.27), it follows that almost surely 


sup le? — 6) = = sup 
60€0 ES) 


< Kø, Yt. (7.30) 


xyl_ We thus have 


For x >0 we have logx < x — 1. It follows that, for x, y >Q, mie yi 


almost surely, using (7.30), 


IL (0) — 1,(0)| < < ay | 6? —o? 2 l ()]| 
sup |in la n sup = € Og | = 
OEO a O69 õa? |7 6; 
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The existence of a moment of order s > 0 for E deduced from assumption A1 and Corollary 2.3, 
allows us to show that pe? — Q0 a.s. (see Exercise 7.2). Using Cesaro’s lemma, point (a) follows. 


(b) Identifiability of the parameter. Assume that hra (0) = g (80), Pa a.s. By Corollary 2.2, the 
polynomial Bg(B) is invertible under assumption A2. Using (7.10), we obtain 
— — An (B) | 2- 2 _ 2 
Bo(B) Ba (B)} © Bal) B0) 


If the operator in B between braces were not null, then there would exist a constant linear combi- 
nation of the e2 j j = 0. Thus the linear innovation of the process (€7) would be equal to zero. 
Since the distribution of nA is nondegenerate, in view of assumption A3, 


e? — Ep (e7le7_1,---) = 0? (0)(n? — 1) #0, with positive probability. 


We thus have 
Ao (z) = An) 
Boz) Balz)’ 


Under assumption A4 (absence of common root), it follows that Ag (z) = Aa (z), Ba (2) = Ba (z) 
and w = wo. We have thus shown (b). 


te 
Ba (1) 7 Boy (1) 


V|z| <1 and (7.32) 


(c) The limit criterion is minimized at the true value. The limit criterion is not integrable 
at any point, but Eg 1,(6) = Eo, €:(@) is well defined in RU {+00} because, with the notation 
xT = max(—x,0) and xt = max(x, 0), 


Eol; (0) < Ea log” o? < max{0, —logw} < 00.” 

It is, however, possible to have Ea,£:(8) = co for some values of 6. This occurs, for instance, 
when 6 = (w,0,..., 0) and (€+) is an IGARCH such that Ege? = oo. We will see that this cannot 
occur at 0o, meaning that the criterion is integrable at 6o. To establish this result, we have to show 
that Et; (0) < oo. Using Jensen’s inequality and, once again, the existence of a moment of 
order s > O for é: we obtain 


Eo log* o? (00) < 00 


because 
2 1 49 S 1 2 AY 
Ev, log o7 (60) = Eg- log {oF ()} < = log Eg {07 (00) } < 00. 


Thus 
aF (0o)n? 


Eol: (80) = Eo | o2 (0o) 
t 


+ logoo) = 1 + Eg log o7 (00) < 00. 


Having already established that Eo); (00) < oo, it follows that Ee,£:(80) is well defined in R. 
Since for all x > 0, logx < x — 1 with equality if and only if x = 1, we have 


a? (@) of (0) n? 
Eo £:(0) — E £,(09) = Eo, log —— + he eh Se no 
69 %t o*t i oe ( 00) 8o oe ( 0) ot 
= Eq, log aO) + Eg or (60) — 
Daa of (60) ° a; (8) 
oF (0) o? (80) 
> Eo fiog ; r og 5 5 | = (1.33) 
of ( o) of (0) 


2 We use here the fact that (f + g)~ < g7 for f > 0, and that if f < g then f7 > g7. 
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with equality if and only if o? (0o)/07 (0) = l Pa-a.s., that is, in view of (b), if and only 
if 0 = 6. 


(d) Compactness of © and ergodicity of (£;(0)). For all 6 € © and any positive integer k, let 
V,(@) be the open ball of center 0 and radius 1/k. Because of (a), we have 


liminf inf 1,(0*)>liminf inf  1,(0*)-— lim sup sup |l, (8) —1,,(6)| 


N00 8*eV; (0N n=>œ 8*eV;(0)NO n=>œ 0€O 


> liminfn7~ aA inf 0, (6*). 


n> oo EV (A)NO 


To obtain the convergence of this empirical mean, the standard ergodic theorem cannot be applied 
(see Theorem A.2) because we have seen that ¢;(9*) is not necessarily integrable, except at 60. 
We thus use a modified version of this theorem, which allows for an ergodic and strictly stationary 
sequence of variables admitting an expectation in R U {+00} (see Exercise 7.3). This version of the 
ergodic theorem can be applied to {¢,(0*)}, and thus to {infgxey,@)ne €:(@*)} (see Exercise 7.4), 
which allows us to conclude that 


n 


liminfn7' $ inf 0,(0*)= Eq, inf _€,(6*). 
n—>0o a] 8* eV (80)NO 8* EV (80)NO 


By Beppo Levi’s theorem, Eø inforev (nol (0*) increases to Egl1(0) as k — oo. Given (7.33), 
we have shown (d). 

The conclusion of the proof uses a compactness argument. First note that for any neighborhood 
V (80) of A, 


lim sup inf hh (6*) < lim (o) = lim 1,,(4)) = Ey €1 (00). (7.34) 
n= O*EV (A noo n> 
The compact set © is covered by the union of an arbitrary neighborhood V (69) of 0o and the set 
of the neighborhoods V (0) satisfying (d), 9 € © \ V (8o). Thus, there exists a finite subcover of © 
of the form V (60), V(@1),..., V (0%), where, fori = 1,...,k, V(6;) satisfies (d). It follows that 
inf 1,(0) = inf 1,(6). 
jo 0) i= a k oEONV (O) 0) 


The relations (d) and (7.34) show that, almost surely, 8, belongs to V(09) for n large enough. 
Since this is true for any neighborhood V (69), the proof is complete. 


Proof of Theorem 7.2 


The proof of this theorem is based on a standard Taylor expansion of criterion (7.8) at 49. Since 6, 
converges to 0o, which lies in the interior of the parameter space by assumption AS, the derivative 
of the criterion is equal to zero at 0„. We thus have 


7 n a... 
O=n Py 5g en) 
t=1 


z ay len. 07 3 ” 
p2 D 59 tt Oo) + (> 2 76,90, ——#,(6} >) Jn (Ô, — 60) (7.35) 


t=1 


3 To show (7.33) it can be assumed that Ea | log o? (0)| < oo and that Egle? /o7 (0)| < oo (in order to use 
the linearity property of the expectation), otherwise E,,¢,;(@) = +00 and the relation is trivially satisfied. 
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where the GF; are between ô, and 6. It will be shown that 


n 9 7 £ 
-1/2 
n7" 2 39 (1 (0) + N (0, (kn — DJ), (7.36) 
and that 
a 90,0," i) — J(i, j) in probability. (7.37) 


The proof of the theorem immediately follows. We will split the proof of (7.36) and (7.37) into 
several parts: 


86; (00) 


2hr ooo der Ho 
-I 0000 


< œ, Eo < OO. 


(a) Eo 


(b) J is invertible and Varg |460} = fre, — 1} J. 


(c) There exists a neighborhood V(0o) of @ such that, for all i, j,k € {1,...,p+4 +1}, 


334, (0) 
Eo SP | 56,00,00 
GEV(O9) k 
-1/2 alio) _ 3o) -15y [2u _ ae) : 
(a) |» = | eo 30 30 H| and supgeyap) ||” a 3000" 3000" tend in 


probability to 0 as n > oo. 
mo L£ 
(e) n2 E ines 5 N(0, (ky — DJ). 
O a Dii E m Oh) > JC, j) as. 


(a) Integrability of the derivatives of the criterion at 0. Since ¢;(0) = €?/o7 + log of, 


we have 
34, (0) 1 
ar -(1- 3] (43) g% 
AOE 1 e ee 1 ðo? 
3036" ={1- SN ~—|. {255 HE 30 2 ag j’ ae 


At 6 = 6p, the variable €e? /o? = n? is independent of o? and its derivatives. To show (a), it thus 
suffices to show that 


Ex 1 30? @)| < È 1 do ao? j awe 807 6 (7.40) 
Sr DO; X 
ae 00 °° 60 | 52 0006" 9 o [za 36 a0" || < 
In view of (7.28), we have 
3o? lo) o CO 
eh aes a41) 
2 k=0 k=0 
do? = : Bİ- 1 p(j) pk-i 
ap; eB | ee (7.42) 
J k=1 i=l 
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where 1 = (1,0,..., 0)’, €? = (€7,0,..., 0)’, and BY is a p x p matrix with 1 in position (1, j) 
and zeros elsewhere. Note that, in view of the positivity of the coefficients and (7.41)—(7.42), the 
derivatives of ör are positive or null. In view of (7.41), it is clear that 3o? /dw is bounded. Since 
of > w>0, the variable {807/3w}/o7 is also bounded. This variable thus possesses moments of 
all orders. In view of the second equality in (7.41) and of the positivity of all the terms involved 
in the sums, we have 


ao2 CO CO 
Th = D B oie? pi = 5 Big = a 
Oi k=0 k=0 
It follows that 
1 ðo 1 
ee 7.43 
o? 0a; ~ Qj ( ) 


The variable 0, (8o? /da;) thus admits moments of all orders at 0 = 69. In view of (7.42) and 
B;B < B, we have 


oo 


Using (7.27), we have || B*|| < Kø for all k. Moreover, e having a moment of order s € (0, 1), the 
variable c, (1) = œ + Da 1 Oi e? į; has the same moment. 4 Using in addition (7.44), the inequality 
o> at BEAL, 1c,_,(1) and the telatión x/(1 +x) < x’ for all x > 0,> we obtain 


2 œ k 
ye = TSY = as a 
o? OB; Bj & o + BHI, DeC) 
B, De 0) 
s= Ly mal == 
Pj k=1 2 
K: = 2k 
oe — Ea {cD} X kp Ea (7.45) 
Bj ome a 


Under assumption A5 we have o; > 0 for all j, which entails that the first expectation in (7.40) 
exists. 

We now turn to the higher-order derivatives of g7 In view of the first equality of (7.41), 
we have 


9262 9262 2 Ga 
Si _ Lla o = Bi-1 BG) BEEN 1. 7.46 
Jw? dda; and — T =} 13 = i 
We thus have 
32o oo 
bi- < J kB*L, 
dwoB; = 


4 We use the inequality (a + b)? < aë + b° for all a, b > O and any s € (0, 1]. Indeed, x* > x for all 
x € [0, 1], and ifa +b>0, (5y + (A) 2 tL 
SIfx > 1thenx’>1>x/( (+x). IfO <x < 1 then x° > x > x/(1 +x). 
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which is a vector of finite constants (since p(B) < 1). It follows that 0707 (60) /d@06; is bounded, 
and thus admits moments of all orders. It is of course the same for [3*0 (80)/ 3x30; } /a? (00). 
The second equality of (7.41) gives 


a202 


CO k 
2i = —_ i—l p(j) pk-i 
=0 d = 3 J B=- BY B 7.47 
, an dQ; a 2 teas ~ ( ) 


ddi 0a; 


The arguments used for (7.45) then show that 


Z o2 , , * 
3 o; Pah a 
Ot Bj 


This entails that {0°07 (0)/da; 90} /o7 (0) is integrable. Differentiating relation (7.42) with respect 
to Bj, we obtain 


iad k i-1 
BiB isa a T = Pjby 5. È (È Bo! Bui ee) sos) 


k=2 Li=2 {=1 


k-1 k-i 
a. TE ( gigis l Gi» 
i=l é=1 


fore) k k-1 
2), bx —1)Bk +) k- z AR 
i=2 i=1 


Eg, 


=J kk- DB (7.48) 


because pj;BY < B. As for (7.45), it follows that 


Ta P/B; oB K* 
of $ BiBy 
and the existence of the second expectation in (7.40) is proven. 


Since {407/d«} /o? is bounded, and since by (7.43) the variables {90} /ð«; } /o7 are bounded 


at ĝo, it is clear that 
1 007(6o) 3o? (60) 


of (0o) 96; 30 


for i = 1,...,q + 1. With the notation and arguments already used to show (7.45), and using the 
elementary inequality x/(1 + x) < x‘/* for all x > 0, Minkowski’s inequality implies that 


s(t a e a 
: of (60) 0B; = Boj = w0 


Finally, the Cauchy—Schwarz inequality entails that the third expectation of (7.40) exists. 


(b) Invertibility of J and connection with the variance of the erenn derivative. Using (a), 
and once again the independence between n? = €? /o? (0o) and o? and its derivatives, we have 
by (7.38), 


dl; (0 
Eal 1( o) 


1 0070) | _ 
a0 7 


| = Eq a nr) Eo [a 6 
t 
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Moreover, in view of (7.40), J exists and satisfies (7.13). We also have 


dL) | _ dL: (80) 3L: (Go) 
30 | = | a0 a6! | oy 


Vary | 


3o? (00)/30 da? (80) /30" | 


=E{(1- n)’ } Ew | o2 (00) o? (0o) 


= {x= 1}J. 


Assume now that J is singular. Then there exists a nonzero vector à in R?+%+! such that 
A {807 (60)/30} = 0 a.s.° In view of (7.10) and the stationarity of {A07(0)/00},, we have 


1 1 
2 2 
€] Ei 
3o20 l P ðo (00) i 
o= e Dy Gq [+ BN aN eg 
op 1 (60) j=l oF 1 (9) 
oP; p (90) oP. p (4) 
Let A= (ào, A1,-..,Ag+p)’. It is clear that A; =0, otherwise e2] would be measurable 
with respect to the o-field generated by {n,, u < t— 1}. For the same reason, we have 
Ag =+++ = Ài = 0 if Aggy) =+ =Ag4i = 0. Consequently, A # 0 implies the existence of a 


GARCH(p — 1, g — 1) representation. By the arguments used to show (7.32), assumption A4 
entails that this is impossible. It follows that 24’JA = 0 implies à = 0, which completes the 
proof of (b). 


(c) Uniform integrability of the third-order derivatives of the criterion. Differentiating (7.39), 


we obtain 
06) i= e 30? (7.50) 
30;00;90, o2 | | a2 86;00;30 i 
e 1 ðo?) {1 37a? 
+ 1752" e a0, J Loz 30,00 
Of er i O; JSK 
x 1 1 3o? 1 3°0? 
of a? 30; | | o2 06:06 
of Qf 007) [1 070? 
oś o? 30k o? 30:90; 
ii [2] (22) 
o? o? 00; of ð j o? 30k 


We begin by studying the integrability of {1 — é? /o7}. This is the most difficult term to deal 
with. Indeed, the variable €?/c7 is not uniformly integrable on ©: at 6 = (w, 0’), the ratio €7/a7 is 


2 2 
j= ple (x2) = 0 
o (00) a0 


if and only if 0,7) (2'd02(69)/30)° = 0 a.s., that is, if and only if (2'002(@)/30)” =O0as. 


6 We have 
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integrable only if Ee? exists. We will, however, show the integrability of {1 — €?/o7} uniformly in 
@ in the neighborhood of ĝo. Let ©* be a compact set which contains 0 and which is contained in 
the interior of © (Y8 € ©*, we have 0 > 8, > 0 component by component). Let Bo be the matrix B 


(defined in (7.26)) evaluated at the point 0 = 69. For all 6 >Q, there exists a neighborhood V(4) 
of 0o, included in ©*, such that for all 6 € V(6o), 


Bo <(1+4)B (ie. Boli, j) < (1 +5)B(i, j) for all i and j). 


Note that, since V(@9) C ©*, we have SUPA eVo) 1/a; < oo. From (7.28), we obtain 


co q fo) 
of =o) BI + > a; {> ad, veal 
k=0 j=l k=0 


and, again using x/(1 + x) < x° for all x > 0 and all s € (0, 1), 


2 oO k q oo BK 1.1 2 , 
o; a á mi wo Zro BoC, 1) f Ya; y oC Kita 
BEVO) w toy OF BEL, Dek 


OEV) Ot i=1 


q © pk k 2 7 
oi Bod, 1) fa Brd, Deny; 
<K+>) > sup [eran ea 


<=] PEV) 


q œ 
TAZ N Gay pers. (7.51) 
i=l k=0 


| a 


If s is chosen such that E €?s < oo and, for instance, ô = (1 — p*)/(2p*), then the expectation of 
the previous series is finite. It follows that there exists a neighborhood V(69) of 6 such that 


F J 
er o‘ (0 
Eg, sup + = Eø sup — (60) < œ 


OEVOo) Ft OEV) Or 


4s 


Using (7.51), keeping the same choice of ô but taking s such that Eeft < oo, the triangle inequality 


gives 
2 2 
€ A 
sup eke K}? a; (00) 
9eV(O0) PF |}, OEV) Or 
oo 
< GPK +P Ka YO A +8 |e], < 00. (1.52) 
k=0 


Now consider the second term in braces in (7.50). Differentiating (7.46), (7.47) and (7.48), 
with the arguments used to show (7.43), we obtain 


1 8o? 
sup — ——.——— < kK, 
dcoO* o 00; 06; 706}; 


when the indices ij, i2 and i3 are not all in {q + 1,4 +2,...,q +1 + p} (that is, when the 
derivative is taken with respect to at least one parameter different from the £;). Using again 
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the arguments used to show (7.44) and (7.48), and then (7.45), we obtain 


ao? 


= = k 
TA < J kk- DE- DBA, De, 


k=3 


Bibi brk 


a <8 ara] e [ap ea} 
= < K k(k — 1)(k = 2p" 1 
ocer oF Bdp locer w EFF Br 2 ESIE e eee 


for any s € (0, 1). Since Ea {supgcos cD} < œ for some s > 0, it follows that 


2 


1 ðo? 
a 206, (7.53) 


E TE 
o D. o2 00,00;30; 


dEo* 


It is easy to see that in this inequality the power 2 can be replaced by any power d: 


3302 d 
Eo sup =e 
peor | oF 30; 30; 00x 


Using the Cauchy—Schwarz inequality, (7.52) and (7.53), we obtain 


i el(i ee 
- Sty ao | XK 
o? o? 30:90; 30k 


The other terms in braces in (7.50) are handled similarly. We show in particular that 


Eo, sup 


AEVO) 


0 


1 8o? 


Eo, su 
o AD, (Gp 30,00; 


1 3o?! 
o? 06; 


sup 
deo* 


<0, (7.54) 


for any integer d. With the aid of Hélder’s inequality, this allows us to establish, in particular, that 


€ 1 ðo 1 ðo? 1 ðo? 
ma w |[2— 65} [re al aga Lae | 
EVA) O; O; i o O, k 
e? 1 ðo? 
< || sup |2 — 6—3||| max || sup oO. 
0<V(O) ar |], i lec 2 80; 


Thus we obtain (c). 


(d) Asymptotic decrease of the effect of the initial values. Using (7.29), we obtain the analogs 
of (7.41) and (7.42) for the derivatives of a: 


ae2 4 de, ae2 

= = 3) elo ae k OG = + Bit, (7.55) 
@ w 

að? — k dé ač? 

—= B*e B S p 7. 

BOA 2 s- k-i T 3 ddi T ða; ( 20) 


a 2 tl~q k . re 4 tok ` A tag 
F = $ {> BBO Be te t YOYO BBO at ha, (7.57) 
j 


i=1 k=l tis 
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where ač? /dw is equal to (0, ..., 0)’ when the initial conditions are given by (7.7), and is equal 
to (1,..., 1) when the initial conditions are given by (7.6). The second-order derivatives have 
similar expressions. The compactness of © and the fact that p(B) < 1 together allow us to claim 


that, almost surely, 


3o? a6? ee 0°07 076? Kot. Vt (7.58) 
su — — | < , su — < g : : 
ablaa 00 pe eee 3000" 3090 e 
Using (7.30), we obtain 
1 1 Qed K t 2 
ee eee , etic (7.59) 
Or Ot Of Ot Or OF 
Since 
3f, (0) _ ime 1 86? eid dL) _ _& 1 40/7 
00 67 6? 00 00 o o? 30 
we have, using (7.59) and the first inequality in (7.58), 
eae =|[5- S| (Se h- St {2-21 {eh 
06; 06; 6 of J lo? 06; õe | le? a?) | 20; 
2 2 ~2 
€; 1 00; 06; 
—- >} {— — — Į 0 
+| eha 20, {| 
E : of (6) 96; 
It follows that 
Liya $ f 9260) — 920) aay 2 a C) 
1/2 aa < K*n `!” +n |1 : 7.60 
r >| a y |e a L +m) + aay a0, (7.60) 


Markov’s inequality, (7.40), and the independence between 7; and o? (4) imply that, for 


all £ > 0, 
z 1 00;2(4) 
P |n! HET ANES : >e 
( 2 CID a ag 
2 1 a02(6) “ 
<-(1+&,|—=—-— Jew p' > 0, 
E ( lofo) 90 > 


which, by (7.60), shows the first part of (d). 


GARCH QMLE 167 


Now consider the asymptotic impact of the initial values on the second-order derivatives of 
the criterion in a neighborhood of 69. In view of (7.39) and the previous computations, we have 


sup nv! ” E = Tae) 
sevo] | 30:90; 3030; 
g 2 2 y sD 
— € € 1 do, 
cS [E-A] 
p=] PEV) | LOF oO; Op Corer y 
e 1 1 \ 3o? i / 30? aa? 
ADE (ee — e +— = 
O; of 6? 06; 00; 6? 06,00; 06,00; 
+f -255| {=| {=| 
op ~=— GJ Lo? 06; J (o2 30; 
ea GD 
A oj 6; J 06; 6; \ 06; 00; of 90; 
ee-e E-H) 
6; 96; Ot 6; J ð j orm 00; 0; 


where 


3 {1+} {i+ 1 8o? ‘ 1 ðo? 1 “| 
= sup = = + — — I - 
‘ OEV) oF of 30:90; of 30; of 06; 


In view of (7.52), (7.54) and Hélder’s inequality, it can be seen that, for a certain neighborhood 
V(6), the expectation of Y; is a finite constant. Using Markov’s inequality once again, the second 
convergence of (d) is then shown. 


(e) CLT for martingale increments. The conditional score vector is obviously centered, which 
can be seen from (7.38), using the fact that 0? (80) and its derivatives belong to the o -field generated 
by {€i, i = 0}, and the fact that Eg, (€7le,, u < t) = oF (80): 


E (2e (60) | €u, U < t nee 4 42) Eo (0; (00) — €? | eu, u < t) =0 
8o 30 t UO us = oF (6) 30 t UO A t WO t us =U. 
Note also that, by (7.49), Vare, (0;(60)/00) is finite. In view of the invertibility of J and the 
assumptions on the distribution of n; (which entail 0 < k — 1 < 00), this covariance matrix is 
nondegenerate. It follows that, for all A € R?+49+! | the sequence fa & L (00), E} i is a square 
integrable ergodic stationary martingale difference. Corollary A.1 and the Cramér—Wold theorem 


(see, for example, Billingsley, 1995, pp. 383, 476 and 360) entail (e). 


(f) Use of a second Taylor expansion and of the ergodic theorem. Consider the Taylor expansion 
(7.35) of the criterion at 6). We have, for all i and j, 


n 32 n 32 n a 32 7 
—1 * —1 —l1 * 
£,(6*) = — 1 — }_* eð) (0% —@), 
" 2 7030; L 2 7030; a LL a9 {saan i Ji pe) 


(7.61) 
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where bj; is between OF and 6). The almost sure convergence of 6; j to 0, the ergodic theorem 
and (c) imply that almost surely 


n 


a a? 
<li = — }__4 
ae ee 00" sare. r( | 


a a? 
T —— 40) 
30:30; 


Since lo% — 09|| — O almost surely, the second term on the right-hand side of (7.61) converges 
to 0 with probability 1. By the ergodic theorem, the first term on the right-hand side of (7.61) 
converges to J(i, j). 

To complete the proof of Theorem 7.2, it suffices to apply Slutsky’s lemma. In view of (d), 
(e) and (f) we obtain (7.36) and (7.37). 


lim sup 
n> 


D3 a a pes 
n = ETETA ij 
4 96" | 6,00; Y 


= Eg sup 
OEV) 


< Ow. 


Proof of the Results of Section 7.1.3 


Proof of Lemma 7.1. We have 


n—1 
Ea 2 2 2 2. 
p”hn -soi +) aaka + p" am, 1 -NTE 
t=1 
n—1 
n 2 
2p œ | [ eon. 
t=1 
Thus 
n—-1 


E EEA. soni 2 2 
liminf — log p"h, > lim — } log pwo + y log paon; ¢ = E log paon; > 0, 
n>o n 


n>œ n 
t=1 


using (7.18) for the latter inequality. It follows that log p”h,, and thus p"h,, tend almost surely 
to +00 as n > oo. Now if p"h, > +00 and p"e? = p"hyn> > +00, then for any ¢>0, the 
sequence (7?) admits an infinite number of terms less than £. Since the sequence (7?) is ergodic 
and stationary, we have Po} < £) >Q. Since e is arbitrary, we have P? = 0) > 0, which is in 
contradiction to (7.16). 


Proof of (7.19). Note that 
(Ôn, Ân) = arg min Qn (0), 
PTS 


where 


1 n 
On(0) = — $ 14O) — L (00). 


t=1 


We have 


1< o (80) o? (0) 
O)=- ayer =i stoe = 
= dt | 20) |+ E G2 Oo) 


n 2 2 
1 x (@0 =w) + (Qo = aE, w+ ae] 
=- y > tH lor ————. 
n @+ Ae wo T ADE; | 
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For all 0 € ©, we have a ¥ 0. Letting 


n 


1 5 (Ao — @) a 
On (a) = DD ——— t log — 


a 
t=1 0 


and 
(Mp — w) — w («o — a) 
d, oo ÅĄą——— ood 


’ 


alw +ae? |) 


we have 


n 


eae 1 
Qn (0) — On (a) = i > Ni dj. + = X log 


" (w+ ae?_,)ao 
t=1 t=1 


z —>0 as. 
(wo + &oef_;)& 


since, by Lemma 7.1, e — oo almost surely as t —> oo. It is easy to see that this convergence is 
uniform on the compact set ©: 


lim sup|Q,(@) — O,(@)| =O as. (7.62) 


noo dcO 


Let aw) and ag be two constants such that ay < œo < a . It can always be assumed that 0 < a . 
With the notation 67 = n~! )7/_, n7, the solution of 


až = arg min On (a) 


isa, = aô. This solution belongs to the interval (aq , ag ) when n is large enough. In this case 


o** = arg min , On(@) 


n jai 
ag lag do ) 
is one of the two extremities of the interval (a , ag ), and thus 


lim O,(o**) = min | lim O, (a>), lim On(as)} >0. 
n—> oo n—> oo n—> oo 


This result and (7.62) show that almost surely 


lim min : Q,(0) >0. 


n> GEO, agla ag ) 


Since ming Q, (0) < Qn (80) = 0, it follows that 


: : - „+ 
im arg Qn (0) € (0, œ) x (Ap , Aq )- 


Since (œg stig ) is an interval which contains œọ and can be arbitrarily small, we obtain 
the result. 


To prove the asymptotic normality of the QMLE, we need the following intermediate result. 
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Lemma 7.2 Under the assumptions of Theorem 7. 


2 sup 


] 2EO 


- Lul 


n mr) 


3, we have 


<œ a.s., 


<œ a.S., 


=o(1) as., 


=O(1) as. 


(7.63) 


(7.64) 


(7.65) 


(7.66) 


Proof. Using Lemma 7.1, there exists a real random variable K and a constant p € (0, 1) inde- 


pendent of 6 and of t such that 


—(w9 + a€?_,)n? 


ð 
—4(0)| = 
a tÈ | | (w+ ae? _,)? 


2 
w+ MEF) 


< Køn? +1). 


Since it has a finite expectation, the series bear Kp' n? + 1) is almost surely finite. This shows 


(7.63), and (7.64) follows similarly. We have 


PL(,a0) 1 _ | (ot aon _ Ge 
da2 a w + ae, (@ + age? 1)? 
4 
€ 1 
= (2ņ -1 = == tri, 
( i aay a i 
2( 1) l Hr r 
= 2 (7 a ey eae 
where j 4 
2 — w € 
sup |r1, | = sup soe dm — 1 =o(1) as. 
pco geo | (@ + oE? 1) (œ + ae?_,)? 
and 
4 1 
sup |ro,1| = sup |(2n7 — 1)  ——— = -> 
cea” ges i (@ +a?) aĝ 
2 5 
w° + 2a9€r_| 
= sup |(2n7 — 1) | == 
seo} | aalw + aye?_,)? 
=o(1) as. 


as t — oo. Thus (7.65) is shown. To show (7.66), it suffices to note that 


2-6 (wo + aoe? )n? 
@ 


2 
+aer, 


fp+s(s2+2) td 


a3 
40) 


IA 


1 


a2 


a 
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Proof of (7.20). We remark that we do not know, a priori, if the derivative of the criterion 
is equal to zero at 6, = (On, An), because we only have the convergence of @, to a. Thus the 
minimum of the criterion could lie at the boundary of ©, even asymptotically. By contrast, the 
partial derivative with respect to the second coordinate must asymptotically vanish at the optimum, 


fo) 
since @, — ap and 6) € ©. A Taylor expansion of the derivative of the criterion thus gives 


1 


— £ Ô, 1 
~ Jn t= =! te r( me m 2 90 — 4 (00) + Inn On — 00), (7.67) 


where J, is a 2 x 2 matrix whose elements are of the form 


I<, F ie 
In Gis j) = n 2 99,00, ED 


with orj between 6, and 6). By Lemma 7.1, which shows that e? — oo almost surely, and by the 
central’ limit theorem of Lindeberg for martingale increment (see Corollary A.1), 


loa 
—)°—£,(4) = 1- n? 
A 5 {1 (0) a a- 17) —— —— 


1 n 3 1 
me -tA 


5N (0 a- 2 (7.68) 
Xg 


Relation (7.64) of Lemma 7.2 and the compactness of © show that 


Jn (2, DVn (ôn — wo) > 0 as. (7.69) 
By a Taylor expansion of the function 
arp L 3 T a), 
we obtain 
J(2,.2) = l De a Li (03 5,9) + l 3 ay, 9+ @")(a5 > — a0), 
= da da i n 3a? : ' 
where œ* is between œž, and a. Using (7.65), (7.66) and (7.19), we obtain 


1 
J, (2,2) > =z as. (7.70) 
a 


We conclude using the second row of (7.67), and also using (7.68), (7.69) and (7.70). 
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Proof of Theorem 7.4 
The proof follows the steps of the proof of Theorem 7.1. We will show the following points: 


(a) limn SUP yee Iln (9) — In()| = 0, a.s. 
(b) (3t € Z such that €,(%) = €: (0o) and o7(~) = 07 (Go) Pyy a.s.) => Y = Go. 
(c) If pF po, Epoki (p) > Egos (90). 
(d) For any g ¥ po there exists a neighborhood V (ø) such that 
liminf inf 1,(y*)> Eg li(Qo), a.s. 


n>% g *eV (p) 


(a) Nullity of the asymptotic impact of the initial values. Equations (7.10)—(7.28) remain valid 
under the convention that €; = €, (V). Equation (7.29) must be replaced by 


6? = č, + Bé,_,+---+ Be, + Ba}, (7.71) 
where č, = (w + J =1 C€;_j,9,..., 0)’, the ‘tilde’ variables being initialized as indicated before. 
Assumptions A7 and A8 imply that, 

for any k> 1 and1 <i<q, sup lexi — č&k—il < Kp*, a.s. (7.72) 
ge® 


It follows that almost surely 


Ilex — &l <D oe- i il 


q 
< J loillčk-i — €k- |(l2er-il + lei — €r- l) 


q 
< Kø (£ lex—i| + 1) 


i=l 
and thus, by (7.28), (7.71) and (7.27), 


t 
Do Bc, = &) + BY (ap — čo) 
k=0 


t q 
KY poh (>: cil ) + Kp! 
k=0 


i=l 


2 x2 
le; — čl = 


t 
< Kp! $, (el +1). (7.73) 
k=-q 


Similarly, we have that almost surely |e? — A < Kp'(\e;|+ 1). The difference between the 
theoretical log-likelihoods with and without initial values can thus be bounded as follows: 


z2 2 o2 2 a?) 
oO, — O; =g. E =E 
sup Il, (9) — hn(y)| < 07 ; 5 sup {|% ~2 2 e+ log (14 2 tl + i; ul} 
r=] PEP õ 0; 6? ő; 


n t 
< {sip : | kw! rates oe (lekl + 1). 
t=1 


pEb k=-q 
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This inequality is analogous to (7.31), e? + 1 being replaced by & = (e? +1) Fy (lekl + 1). 
Following the lines of the proof of (a) in Theorem 7.1 (see Exercise 7.2), it suffices to show that 
for all real r > 0, E(p'£)" is the general term of a finite series. Note that” 


t 
EPE <p? Y E (lel + e? + lel +1)" 
k=—q 
t 
<p? SE) Elel}!? + Ele + Eler? + 1] 
k=—q 


= O(tp"), 


since, by Corollary 2.3, E (e) < œœ. Statement (a) follows. 


(b) Identifiability of the parameter. If €,;(0) = €;(00) almost surely, assumptions A8 and A9 
imply that there exists a constant linear combination of the variables X;_;, j = 0. The linear 
innovation of (X;), equal to X; — E(X;|Xy, u < t) = n10 (p0), is zero almost surely only if n, = 0 
a.s. (since oa? (p0) = œo > 0). This is precluded, since E(n?) = 1. It follows that 3? = Jo, and thus 
that 0 = 0o by the argument used in the proof of Theorem 7.1. 


(c) The limit criterion is minimized at the true value. By the arguments used in the proof 
of (c) in Theorem 7.1, it can ne shown that, for all p, Egọln (p) = Ey, :(@) is defined in R U {+00}, 
and in R at gy = pọ. We have 


Epli (p) — Ego ts (Go) = Egy log 


op (9) E B ga 
of (Yo) 3 of (9) oF (Yo) 
Ory) oF (Yo) _ 1 


BL {ie ooo) FO) 
{er (0) — & (Bo)? 
E 07 (p) 
2710; (Yo) {E0 ) — €1(Vo)} 
o? (9) 


T Eg 


>0 


because the last expectation is equal to 0 (noting that €,() — €, (0o) belongs to the past, as well 
as o; (po) and o;(¢)), the other expectations being positive or null by arguments already used. This 
inequality is strict only if €;(%) = €;(Wo) and if o? (g) = a? (po) Pp a.s. which, by (b), implies 
p = po and completes the proof of (c). 


(d) Use of the compactness of ® and of the ergodicity of (£;(@)). The end of the proof is the 
same as that of Theorem 7.1. 


Proof of Theorem 7.5 


The proof follows the steps of that of Theorem 7.2. The block-diagonal form of the matrices 
T and J when the distribution of 7; is symmetric is shown in Exercise 7.7. It suffices to establish 
the following properties. 


7 We use the fact that if X and Y are positive random variables, E(X + Y)" < E(X) + E(Y)" for all 
r € (0, 1], this inequality being trivially obtained from the inequality already used: (a + b)" < a” + b” for all 
positive real numbers a and b. 
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2L (go) B61 (Go) 3? Li (yo) 
(a) Ep |- y || < © Epo || pag" 
(b) Z and J are invertible. 
-1/2 dLi(g9) _ Abr (eo) n 2al) 2g) ; 
(c) |» 4 “ap (OD and SUPpeV(yo) ||” t=1 | dag" Tosg tend in 


probability to 0 as n > oo. 
A nE leo) > NO, D. 


(e) n=! Si seit a (g*) > Jli, j) a.s., for all g* between ¢, and go. 

Formulas (7.38) and (7.39) giving the derivatives with respect to the GARCH parameters (that 
is, the vector 0) remain valid in the presence of an ARMA part (writing e = €7(9)). The same is 
true for all the results established in (a) and (b) of the proof of Theorem 7.2, with obvious changes 
of notation. The derivatives of £, (p) = €7(0)/o7(g) + logo? (g) with respect to the parameter 9, 
and the cross derivatives with respect to 6 and 2, are given by 


ae A ae 2 2, A 
2 t bear aa (7.74) 
a0 oZ) o 09 ok ðv 
L(g) _ ee 1 8, P JE ei 1 ðo? 1 807 
avan’ of aga of of dù of av’ 
2 ðe ðe 2e, de 2e, (dG 1 da 1 3o? de 
+422 HZ — 4 ——— = L), (7.75) 
o2 ðY AW op AVAD a9 o2 AW o? Ad AY 
WAONE i a 1 8o? of ae ðo? 1 ðo? 2e, de 1 B07 
3ta ar 3 3900 o o2 00 o2 00’ o dÒ a? dV 
(7.76) 
The derivatives of €, are of the form 
dEr 1 
a = (—Ay (1)B3' (1), 4-18), ..- , vr-p (O), ur-1 (0), -.. , r-o (8)) 
where 
v, (0) = —A7' (B)e (9), u; (Ù) = By! (B)e, (0) (7.77) 
and 
01,0 
32e, Oira j : 
asia —A7>!(B)B>!(B)Hp o(t 
IT » (B)B, (B)Ap.o(t) |, 
001 —Ay'(B)B;'(B)Ho,p(t) —2B;?(B)Ho,9(t) 
(7.78) 


where Hk e(t) is the k x £ (Hankel) matrix of general term €;_;_ ;, and Og e denotes the null matrix 
of size k x £. Moreover, by (7.28), 


a 1 derki 
= = BD aie 4, (7.79) 
k=0 i=1 
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where 7; denotes the jth component of %, and 
[e.e] 


Wo; =) E KAD S2 dEr—k—i IEt—k—i ase O erki (7.80) 
A Qi kia ar i . 
00,00 r 00; ade T aD OD 


(a) Integrability of the derivatives of the criterion at ¢9. The existence of the Sec ions in 
(7.40) remains true. By (7.74)—(7.76), the Maependence between (€; /o1)(@o) = =m and o; 2 (vo); its 
derivatives, and the derivatives of €,(% ), using E (n$) < œ and o; 20) > wọ > Q, it suffices to 
show that 


O65 Oe gs Evy : aha (7.81) 
a0 on" TT 0 i 
1 ðo? dok ) En do. Fora ) z 3?0o? D (7.82) 
oo, lo) ; 
of 88 on avon’ || S % || zaar || < 


to establish point (a), together with the existence of the matrices Z and J. By the expressions for 
the derivatives of €;, (7.77)-(7.78), and using Ee? (30) < ©, we obtain (7.81). 
The Cauchy—Schwarz inequality implies that 


2f q rr 
{Yond (Zu (e) 


dj 


ðe i (V0) 
Sauer k- (09) s 


i=l 


Thus, in view of (7.79) and the positivity of wọ, 
1/2 1/2 
3o? a to j 5 derri (90) \7 
39, < o? B&C, 1) co + Z aoii Po) 2% (=) ; 


Using the triangle inequality and the elementary inequalities ()~ lx)! < D jx;|!/2 and x/( + 
x?) < 1, it follows that 


1 ao? % BA, DeO A, DEL a? || 
< I2 J 
oe a0; — (p0) P = 2 w+ BEAL, 1)c, (1) (Yo) l 
BEC, Daa P r 
€t—k—i 
< BPG, p— 2 a Pee (oo) 
>? ig BRA, Dt) 2 2 ; av; 
= 2 
ee ie 
Le Da 1/2 = k-i (Vo) oe aaa) 
Vo i=1 dj 2 


The first inequality of (7.82) follows. The existence of the second expectation in (7.82) is a 
consequence of (7.80), the Cauchy—Schwarz inequality, and the square integrability of €, and its 
derivatives. To handle the second-order partial derivatives of ora first note that ( aa? /d030@)(Yo) = 
0 by (7.41). Moreover, using (7.79), 


€t—k-i 
av; 


co 
I BK, Denki 


k=0 


(Yo)| < œ. (7.84) 


Ew 
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By the arguments used to show (7.44), we obtain 


070? 
= (¥0) 


E 
Boe po 39 Be 


z Ep 


oo q4 

Erki 
> KBE 1,1 ` 2000; €:—-k—i < o, 7.85) 
2 ( 2. 0i€t—k 30, (po) ( 


which entails the existence of the third expectation in (7.82). 


(b) Invertibility of Z and J. Assume that Z is noninvertible. There exists a nonzero vector A in 
IRP+2+P+4+? such that A’dL;(yo)/Iy! = 0 a.s. By (7.38) and (7.74), this implies that 


1 ðo? 2 v 
(1 — 92) n EO y 2 bE gag (7.86) 
o2(y) dag (go) ay 


Taking the variance of the left-hand side, conditionally on the o-field generated by {n,, u < t}, 
we obtain a.s., at = Go, 


1 _,802\7 1 _,d022.,8 a 8a%? 
0 = 1) (=) aie ey a (2%) 

O, of Op Oo OY Oo, OY 
:= (Ky — Ia? — 2v,a;b; + b? 


2) 2 2 
= (Ky 1 va; + (b; Vyar) ’ 


where v, = E(n?). It follows that «,-1—v) <0 and b; =a,{v, = (v +1—K,)'?}as. 
By stationarity, we have either b; = a,{v, (v3 1 iy} a.s. for all t£, or b; = ar{vy + (vs + 
1— es 2} a.s. for all t. Consider for instance the latter case, the first one being treated similarly. 
Relation (7.86) implies a[l — n? + {v, + (w? + 1— Kn)" jm] = 0 a.s. The term in brackets 
cannot vanish almost surely, otherwise 7, would take at least two different values, which would 
be in contradiction to assumption A12. It follows that a, = 0 a.s. and thus b; = 0 a.s. We have 
shown that almost surely 


Deo) _ 1 9) _ 4 amd PE _ 


a 1 
ap av ap 


0, (7.87) 


where à; is the vector of the first P + Q + 1 components of i. By stationarity of (de,/dg),, the 
first equality implies that 


—Ay, (1) —Ay, (1) 
co — X1-1 co — Xy-1 
: Z der; (o) ' 
O=A, | co- Xip | +9 boj —— =N | co- Xir 
€t-1 j=l €t-1 
Et- Q Et- Q 


We now use assumption A9, that the ARMA representation is minimal, to conclude that à; = 0. 


p py. 
The third equality in (7.87) is then written, with obvious notation, as a, eo) = 0. We have 
already shown in the proof of Theorem 7.2 that this entails à2 = 0. We are led to a contradiction, 
which proves that Z is invertible. Using (7.39) and (7.75)-(7.76), we obtain 


J= E ( 1 3o? ar ) 12E ( 1 dE: der >) 
= £9 of ap ag! po po o2 ap ag! po r 
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We have just shown that the first expectation is a positive definite matrix. The second expectation 
being a positive semi-definite matrix, J is positive definite and thus invertible, which completes 
the proof of (b). 


(c) Asymptotic unimportance of the initial values. The initial values being fixed, the derivatives 
of a, obtained from (7.71), are given by 


t 


À map PoS 
ES E whe, = B= ya oats, 


k=1 


with the notation introduced in (7.41)—(7.42) and (7.55)—(7.56). As for (7.79), we obtain 


62 ~ 
črki 
= X` B*a,1 2tik 
-5 ( ye QiEt—k— a) 


and, by an obvious extension of (7.72), 


sup max fle — €|, | — | = Kp, a.s. (7.88) 


ge® 


Thus 


2 z2 
do, að; | 


<$ ea; DY ame. i m a 


t-1 qd 
+% g*a, 1) È 2o; 
k=0 i=l 


tA ðEt—k-i 3 O€t—k-i O€;—k-i 
(€r-k-i čt—k—i) a0; + €;—k-i ( i ) 


lo, @) 
O€_7-1 Erk- > 
Jefe Peis HE ee [oe ea 
k=0 i= 
oo t—l+q i Je 
Zk i —k “ 
< Kp Do en | o D eo {| C4) ea) 
k=0 J k=l J 
~ de de 
< Koll? k2 | OT la] e 
Ss Ap Xo 1 a0; a, GSS al 
k=0 
The latter sum converges almost surely because its expectation is finite. We have thus shown that 
ðo? aa? 7 
sup |— — ——|<Kp as 
geo |OD; = OD; 


The other derivatives of o- are handled similarly, and we obtain 


2 z2 
do, (96; 


<Kp' as. 
ðp Ow f 


sup 
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We have, in view of (7.73), 


1 1 õ -o| K, 2 f 
3 > So saa | S ae Si = $ AKo Sis 
Or Or Ot Ot Ot Or 


where S;,—1 = Si (lel + 1). It is also easy to check that for p = go, 


z2 
t 


= € 
-els Katon, |i- 5| 14+ KoA + mlS +S. 


It follows that, using (7.88), 
1 1 3o? 1 1 1 3o? 
=~2 2 t 2 t 
=|{& — e = 4 eo ee Se Sa rO 
Cc alaha] ae a} ase} 
g? 1 1 3o? č? 1 d02 0G? 
E ate Shad EES 
6 jlo õi Gi 6? J |G; a9; «OG; 
1 


1 dEr 0€; 
+2& 4 — } } — - — 
i lz | | ðpi OG; | Wo 


3L, (Go) _ ae; (Yo) 
Ogi Ogi 


1 3o? ðe 
< Ko! (14+ SE nd + a {1+ oe aed 


of Ogi dpi |. 


Using the independence between n, and S;_;, (7.40), (7.83), the Cauchy—Schwarz inequality and 
E(ef) < œ, we obtain 


P | |n- £ 3L: (po) _ 3l, (po) - 
~ l pi OQ; 


K 1 3o? dEr —1/2 ” t 2 2 
< Epp (0) +| | XO o {1+ Clo lo + llama DIS? a l2} 
t 2 t=1 
K 1 ðo? ðe, = 
< —|1+ —— (g) + |—| (go) n712 pit? > 0, 
€ | o? 0g; 0g; 2 2 


which shows the first part of (c). The second is established by the same arguments. 


(d) Use of a CLT for martingale increments. The proof of this point is exactly the same as that 
of the pure GARCH case (see the proof of Theorem 7.2). 


(e) Convergence to the matrix J. This part of the proof differs drastically from that of 
Theorem 7.2. For pure GARCH, we used a Taylor expansion of the second-order derivatives 
of the criterion, and showed that the third-order derivatives were uniformly integrable in a 
neighborhood of 6o. Without additional assumptions, this argument fails in the ARMA-GARCH 
case because variables of the form o’ (ðo? /d%) do not necessarily have moments of all orders, 
even at the true value of the parameter. First note that, since J exists, the ergodic theorem 


implies that 
n 


1 ae 
lim ey esha) =J as. 
i pap 
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The consistency of ¢, having already been established, it suffices to show that for all € > 0, there 
exists a neighborhood V(po) of go such that almost surely 


n 


BLP) 3L (Yo) 


lim — < (7.89) 
n> N *—~ peViyo) apap  ðpðy' 
(see Exercise 7.9). We first show that there exists V(go) such that 
34, 
E - (7.90) 
pevigo) || 3P3P 


By Hölder’s inequality, (7.39), (7.75) and (7.76), it suffices to show that for any neighborhood 
V (po) C ® whose elements have their components «œ; and f; bounded above by a positive constant, 
the quantities 


de a? 
sup e s sup an ; aa . (7.91) 
gEV) İla geV(go) | OF 4 penigo) | 9090 
1 1 ðo? 1 ðo 
sup =|, sup |||, sup |— cet (7.92) 
PEVG0) OF HQ, pev) | 90 4 peip) 107 98 4 
07a 0702 1 8o? (7.93) 
su ; sup 
pev) | OD OW eV(yo) | 9099" |}, gevo) | oF 9006" 


are finite. Using the expansion of the series 
€,(0) = Ay(B)B;'(B)A5) (B) By, (BY (00), 


similar expansions for the derivatives, and ||€;(¥o)||4 < oo, it can be seen that the norms in (7.91) 
are finite. In (7.92) the first norm is finite, as an obvious consequence of o? > infge w, this latter 
term being strictly positive by compactness of ®. An extension of inequality (7.83) leads to 


CO 
1 ae kD 
sup a KD sup < ©. 
pEb adj 20 ged DA 4 


Moreover, since (7.41)—(7.44) remain valid when €; is replaced by €; (Ŷ), it can be shown that 


vEeV(Go) d 


for any d>0 and any neighborhood V(yo) whose elements have their components œ; and £j 
bounded from below by a positive constant. The norms in (7.92) are thus finite. The existence of 
the first norm of (7.93) follows from (7.80) and (7.91). To handle the second one, we use (7.84), 
(7.85), (7.91), and the fact that SUP ge Wyo) ie < œ. Finally, it can be shown that the third norm 
is finite by (7.47), (7.48) and by arguments already used. The property (7.90) is thus established. 
The ergodic theorem shows that the limit in (7.89) is equal almost surely to 


476, (9) z 876; (go) 
dpog’ Ipag’ 


E 
~EV(Go) 
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By the dominated convergence theorem, using (7.90), this expectation tends to 0 when the 
neighborhood V(¢o) tends to the singleton {go}. Thus (7.89) hold true, which proves (e). The 
proof of Theorem 7.5 is now complete. 


7.5 Bibliographical Notes 


The asymptotic properties of the QMLE of the ARCH models have been established by Weiss 
(1986) under the condition that the moment of order 4 exists. In the GARCH(1, 1) case, the 
asymptotic properties have been established by Lumsdaine (1996) (see also Lee and Hansen, 1994) 
for the local QMLE under the strict stationarity assumption. In Lumsdaine (1996) the conditions 
on the coefficients aw; and £; allow to handle the IGARCH(1, 1) model. They are, however, very 
restrictive with regard to the iid process: it is assumed that E|7;|*? < oo and that the density of 
n, has a unique mode and is bounded in a neighborhood of 0. In Lee and Hansen (1994) the 
consistency of the global estimator is obtained under the assumption of second-order stationarity. 

Berkes, Horvath and Kokoszka (2003b) was the first paper to give a rigorous proof of the 
asymptotic properties of the QMLE in the GARCH(p, q) case under very weak assumptions; see 
also Berkes and Horvath (2003b, 2004), together with Boussama (1998, 2000). The assumptions 
given in Berkes, Horvath and Kokoszka (2003b) were weakened slightly in Francq and Zakoian 
(2004). The proofs presented here come from that paper. An extension to non-iid errors was 
recently proposed by Escanciano (2009). 

Jensen and Rahbek (2004a, 2004b) have shown that the parameter a of an ARCH(1) model, or 
the parameters a and Bo of a GARCH(1, 1) model, can be consistently estimated, with a standard 
Gaussian asymptotic distribution and a standard rate of convergence, even if the parameters are 
outside the strict stationarity region. They considered a constrained version of the QMLE, in 
which the intercept w is fixed (see Exercises 7.13 and 7.14). These results were misunderstood by 
a number of researchers and practitioners, who wrongly claimed that the QMLE of the GARCH 
parameters is consistent and asymptotically normal without any stationarity constraint. We have 
seen in Section 7.1.3 that the QMLE of wọ is inconsistent in the nonstationary case. 

For ARMA-GARCH models, asymptotic results have been established by Ling and Li (1997, 
1998), Ling and McAleer (2003a, 2003b) and Francq and Zakoian (2004). A comparison of the 
assumptions used in these papers can be found in the last reference. We refer the reader to Strau- 
mann (2005) for a detailed monograph on the estimation of GARCH models, to Francq and Zakoian 
(2009a) for a recent review of the literature, and to Straumann and Mikosch (2006) and Bardet and 
Wintenberger (2009) for extensions to other conditionally heteroscedastic models. Li, Ling and 
McAleer (2002) reviewed the literature on the estimation of ARMA-GARCH models, including 
in particular the case of nonstationary models. 

The proof of the asymptotic normality of the QMLE of ARMA models under the second-order 
moment assumption can be found, for instance, in Brockwell and Davis (1991). For ARMA models 
with infinite variance noise, see Davis, Knight and Liu (1992), Mikosch, Gadrich, Kliippelberg and 
Adler (1995) and Kokoszka and Taqqu (1996). 


7.6 Exercises 
7.1 (The distribution of n; is symmetric for GARCH models) 
The aim of this exercise is to show property (7.24). 
1. Show the result for j < 0. 


2. For j> 0, explain why E {g(e?,€7_1,... lj, —j-1,--)} can be written as 
h(E? ju ...) for some function h. 


3. Complete the proof of (7.24). 


7.2 


7.3 


7.4 


75 
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(Almost sure convergence to zero at an exponential rate) 
Let (€) be a strictly stationary process admitting a moment order s >0. Show that if 


p € (0, 1), then p'e? > 0 a.s. 


(Ergodic theorem for nonintegrable processes) 
Prove the following ergodic theorem. If (X+) is an ergodic and strictly stationary process and 
if EX, exists in R U {+00}, then 


n 
n! y X,—> EX, as., asn— oo. 


t=1 


The result is shown in Billingsley (1995, p. 284) for iid variables. 
Hint: Consider the truncated variables Xf = X, ly,<, where « > 0 with « tending to +00. 


(Uniform ergodic theorem) 
Let {X,(@)} be a process of the form 


X;(@) = SO, Nt, Nt—1, J; (7.94) 


where (n+) is strictly stationary and ergodic and f is continuous in 0 € ©, © being a compact 
subset of R¢. 


1. Show that the process {infg-@ X;(@)} is strictly stationary and ergodic. 


2. Does the property still hold true if X,(@) is not of the form (7.94) but it is assumed that 
{X,(0)} is strictly stationary and ergodic and that X;(@) is a continuous function of 0? 


(OLS estimator of a GARCH) 
In the framework of the GARCH(p, q) model (7.1), an OLS estimator of @ is defined as any 
measurable solution 6,, of 


6, = arg min Qn (0), Oc RPHIH, 
dcO 


where 


6,0) =n!) EO), %(0) = e? — 676), 


t=1 


and 6? (0) is defined by (7.4) with, for instance, initial values given by (7.6) or (7.7). Note 
that the estimator is unconstrained and that the variable &?(0) can take negative values. 
Similarly, a constrained OLS estimator is defined by 


Ô; = arg min Q,(6), ©® C (0, +00) x [0, +00)”. 
dEOr 


The aim of this exercise is to show that under the assumptions of Theorem 7.1, and if 
Eger < œ, the constrained and unconstrained OLS estimators are strongly consistent. We 
consider the theoretical criterion 


Qn) =n Y eO), e0) =< — oF (0). 


t=1 
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1. Show that supgco 0,(0) — On ()| — 0 almost surely as n > ow. 


2. Show that the asymptotic criterion is minimized at 6, 
veaeO, lim Q(6)> lim Q(6), 
noo noo 


and that 99 is the unique minimum. 
3. Prove that Ê, — o almost surely as n —> œo. 
4. Show that ĝe — o almost surely as n —> oo. 


(The mean of the squares of the normalized residuals is equal to 1) 
For a GARCH model, estimated by QML with initial values set to zero, the normalized 
residuals are defined by 4, = €;/6; (6,), t= 1,...,n. Show that almost surely 


1 n 
-2 ^= L. 
t=1 


Hint: Note that for all c >Q, there exists 6* such that &? (6%) = cõ? (6) for all tf > 0, and 
consider the function c +> I, (6). 


(T and J block-diagonal) 
Show that Z and 7 have the block-diagonal form given in Theorem 7.5 when the distribution 
of n: is symmetric. 


(Forms of I and J in the AR(1)-ARCH(1) case) 
We consider the QML estimation of the AR(1)-ARCH(1) model 


2 2 
Xt = a0Xt-1 + ér, € =m, of =1+aoe_y, 


assuming that wọ = 1 is known and without specifying the distribution of nz. 


1. Give the explicit form of the matrices Z and Jin Theorem 7.5 (with an obvious adaptation 
of the notation because the parameter here is (do, œo)). 


2. Give the block-diagonal form of these matrices when the distribution of 7; is symmetric, 
and verify that the asymptotic variance of the estimator of the ARCH parameter 


(i) doe not depend on the AR parameter, and 
(ii) is the same as for the estimator of a pure ARCH (without the AR part). 


3. Compute £ when ao = 0. Is the asymptotic variance of the estimator of ag the same as 
that obtained when estimating an AR(1)? Verify the results obtained by simulation in the 
corresponding column of Table 7.3. 


(A useful result in showing asymptotic normality) 
Let (J;(@)) be a sequence of random matrices, which are function of a vector of parameters 0. 
We consider an estimator 0„ which strongly converges to the vector 0). Assume that 


1 n 
— 5 J:(09) > J, a.s., 
n 


t=1 
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where J is a matrix. Show that if for all e >Q there exists a neighborhood V (0o) of 6 
such that 


n 


ol 
lim — > sup ||J,(0) — J;(@) || <£, a.s., (7.95) 


n>oon — Bev (6) 


where ||- || denotes a matrix norm, then 


eee 
-XO LÂ) > J, a.s. 
n 


t=1 


Give an example showing that condition (7.95) is not necessary for the latter convergence to 
hold in probability. 


(A lower bound for the asymptotic variance of the QMLE of an ARCH) 
Show that, for the ARCH(q) model, under the assumptions of Theorem 7.2, 


Varas {vn (Â, B 60)} = (Ky =, 1)609, 


in the sense that the difference is a positive semi-definite matrix. 
Hint: Compute 69907 (69) /30" and show that J — J@96)J is a variance matrix. 


(A striking property of J) 
For a GARCH(p, q) model we have, under the assumptions of Theorem 7.2, 


1 007 
6G) Shee BS aa 
of (0) 96 
The objective of the exercise is to show that 
Q'J'Q=1, where Q = E(Z,). (7.96) 


1. Show the property in the ARCH case. 
Hint: Compute 65Z;, 0)J and 65/60. 


2. In the GARCH case, let 0 = (@, Q,...,%,0,..., 0)’. Show that 


2 
— 007 (0 
goa x =o, (0). 


3. Complete the proof of (7.96). 


(A condition required for the generalized Bartlett formula) 
Using (7.24), show that if the distribution of 7, is symmetric and if E (ef ) < œ, then formula 
(B.13) holds true, that is, 


Fen EnEnEy =O when ti Ah, ti £ t and tı AK. 


(Constrained QMLE of the parameter ag of a nonstationary ARCH(1) process) 
Jensen and Rahbek (2004a) consider the ARCH(1) model (7.15), in which the parameter 
wo >Q is assumed to be known (wọ = 1 for instance) and where only ao is unknown. They 
work with the constrained QMLE of a defined by 
6° (wp) = arg min Dai ), La) 
a’ (wo) = — a), æ) = 
AN E seS) n ' i o? (a) 


+ log o} (a), (7.97) 
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where a? (a) = wo + ae? y. Assume therefore that w) = 1 and suppose that the nonstation- 
arity condition (7.16) id satisfied. 


1. Verify that 


and that 


2. Prove that 
Ky 


1&3 £ =Ï 
ES Zuan £ n(o i 
Valea” a 


3. Determine the almost sure limit of 


1 n a? 
a > Fat to): 
t=1 
4. Show that for all æ > 0, almost surely 
n 3 


D sexta) 
— — l(a 
n ða? 


t=1 


= 0(1). 


5. Prove that if @ = âf (wo) —> ao almost surely (see Exercise 7.14) then 


L£ 
Jn (&€ — ao) + N{0, (ky — Dag}. 
6. Does the result change when & = &£ (1) and wo Æ 1? 
7. Discuss the practical usefulness of this result for estimating ARCH models. 


(Strong consistency of Jensen and Rahbek’s estimator) 
We consider the framework of Exercise 7.13, and follow the lines of the proof of (7.19) on 
page 169. 


1. Show that &(1) converges almost surely to œọ when wo = 1. 


2. Does the result change if (1) is replaced by & (w) and if w and wo are arbitrary positive 
numbers? Does it entail the convergence result (7.19)? 


Tests Based on the Likelihood 


In the previous chapter, we saw that the asymptotic normality of the QMLE of a GARCH model 
holds true under general conditions, in particular without any moment assumption on the observed 
process. An important application of this result concerns testing problems. In particular, we are able 
to test the IGARCH assumption, or more generally a given GARCH model with infinite variance. 
This problem is the subject of Section 8.1. 

The main aim of this chapter is to derive tests for the nullity of coefficients. These tests are 
complex in the GARCH case, because of the constraints that are imposed on the estimates of the 
coefficients to guarantee that the estimated conditional variance is positive. Without these con- 
straints, it is impossible to compute the Gaussian log-likelihood of the GARCH model. Moreover, 
asymptotic normality of the QMLE has been established assuming that the parameter belongs to 
the interior of the parameter space (assumption A5 in Chapter 7). When some coefficients œ; or 
pj are null, Theorem 7.2 does not apply. It is easy to see that, in such a situation, the asymptotic 
distribution of ./n (6, — o) cannot be Gaussian. Indeed, the components Êin of 6, are constrained 
to be positive or null. If, for instance, 69; = 0 then /7(6;, — oi) = /n6in > 0 for all n and the 
asymptotic distribution of this variable cannot be Gaussian. 

Before considering significance tests, we shall therefore establish in Section 8.2 the asymptotic 
distribution of the QMLE without assumption A5, at the cost of a moment assumption on the 
observed process. In Section 8.3, we present the main tests (Wald, score and likelihood ratio) 
used for testing the nullity of some coefficients. The asymptotic distribution obtained for the 
QMLE will lead to modification of the standard critical regions. Two cases of particular interest 
will be examined in detail: the test of nullity of only one coefficient and the test of conditional 
homoscedasticity, which corresponds to the nullity of all the coefficients a; and £j. Section 8.4 
is devoted to testing the adequacy of a particular GARCH(p, q) model, using portmanteau tests. 
The chapter also contains a numerical application in which the preeminence of the GARCH(1, 1) 
model is questioned. 
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8.1 Test of the Second-Order Stationarity Assumption 


For the GARCH(p, q) model defined by (7.1), testing for second-order stationarity involves testing 


q p : 2 
Ho: J œ+) Boj <1 against Hi: J aut) Boj = 1. 
i=l j=l = = 


Introducing the vector c = (0, 1,..., 1)’ € R?*@*!, the testing problem is 
Hy: c'0) <1 against Hy: c'0) > 1. (8.1) 
In view of Theorem 7.2, the QMLE 6, = (Ôn, Qin, -- +, @gns Bins - - - Êpn) of Oo satisfies 


Jan — 60) 5 NO, (ky — 17), 
under assumptions which are compatible with Ho and Hj. In particular, if c’6) = 1 we have 
Jatc'by — 1) & NO, (ky — De JT !o). 


It is thus natural to consider the Wald statistic 


T = MOi Ĝin + a Bin = 1) 
n— — re ares | 7 


{@, — Def- 


where &,, and J are consistent estimators in probability of Ky and J. The following result follows 
immediately from the convergence of T,, to N(0, 1) when c’6) = 1. 


Proposition 8.1 (Critical region of stationarity test) Under the assumptions of Theorem 7.2, a 
test of (8.1) at the asymptotic level a is defined by the rejection region 


{T, > ®-'(1—a)}, 


where ® is the N(O, 1) cumulative distribution function. 


Table 8.1 Test of the infinite variance assumption for 11 stock 
market returns. Estimated standard deviations are in parentheses. 


Index a+B p-value 
CAC 0.983 (0.007) 0.0089 
DAX 0.981 (0.011) 0.0385 
DJA 0.982 (0.007) 0.0039 
DJI 0.986 (0.006) 0.0061 
DJT 0.983 (0.009) 0.0023 
DJU 0.983 (0.007) 0.0060 
FTSE 0.990 (0.006) 0.0525 
Nasdaq 0.993 (0.003) 0.0296 
Nikkei 0.980 (0.007) 0.0017 
SMI 0.962 (0.015) 0.0050 


S&P 500 0.989 (0.005) 0.0157 
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Note that for most real series (see, for instance, Table 7.4), the sum of the estimated coefficients @ 
and Ê is strictly less than 1: second-order stationarity thus cannot be rejected, for any reasonable 
asymptotic level (when T,, < 0, the p-value of the test is greater than 1/2). Of course, the non- 
rejection of Hp does not mean that the stationarity is proved. It is interesting to test the reverse 
assumption that the data generating process is an IGARCH, or more generally that it does not have 
moments of order 2. We thus consider the problem 


Hy: c') > 1 against Hy: c'O) <1. (8.2) 


Proposition 8.2 (Critical region of nonstationarity test) Under the assumptions of Theorem 
7.2, a test of (8.2) at the asymptotic level a is defined by the rejection region 


{T, < ® '(@)}. 


As an application, we take up the data sets of Table 7.4 again, and we give the p-values of 
the previous test for the 11 series of daily returns. For the FTSE (DAX, Nasdaq, S&P 500), the 
assumption of infinite variance cannot be rejected at the 5% (3%, 2%, 1%) level (see Table 8.1). 
The other series can be considered as second-order stationary (if one believes in the GARCH(1, 1) 
model, of course). 


8.2 Asymptotic Distribution of the QML When 6p is 
at the Boundary 


In view of (7.3) and (7.9), the QMLE 6, is constrained to have a strictly positive first compo- 
nent, while the other components are constrained to be positive or null. A general technique for 
determining the distribution of a constrained estimator involves expressing it as a function of the 
unconstrained estimator Gre (see Gouriéroux and Monfort, 1995). For the QMLE of a GARCH, 
this technique does not work because the objective function 


n 2 
Loan" i. G= iO) = í +logõ?, where 6? is defined by (7.4), 


t=1 t 


cannot be computed outside © (for an ARCH(1), it may happen that &;? := w + aje? is negative 
when a, < 0). It is thus impossible to define a, 

The technique that we will utilize here (see, in particular, Andrews, 1999), involves writing 
Ô, with the aid of the normalized score vector, evaluated at 6o: 


1/2 31n (60) _ 9°In (80) 


Zn = —J;'n n , 
00 0600" 


(8.3) 


with à 5 
€ 
1, (8) = 1,0; En, En-1 +s ) = n`! Doh, ly = £,(0) = 2 + loga?, 
t=1 t 
where the components of 01,,(@9)/00 and of J, are right derivatives (see (a) in the proof of Theorem 


8.1 on page 207). 
In the proof of Theorem 7.2, we showed that 


n'/2(6, — 09) = Zn t+op(1), when 6) €O. (8.4) 
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For any value of 0) € © (even when 6o ¢ 0), it will be shown that the vector Z„ is well defined 
and satisfies 


Zn > Z~ N{0, (ky — DIT}, J = By | <A ee A (8.5) 
Or 


provided J exists. By contrast, when 
0o € 90 := {0 € © : 0; = 0 for some i > 1}, 


equation (8.4) is no longer valid. However, we will show that the asymptotic distribution of 
n'/2(6,, — 0o) is well approximated by that of the vector n'/2(@ — o) which is located at the 
minimal distance of Z,,, under the constraint 6 € ©. Consider thus a random vector 6), (Zn) (which 
is not an estimator, of course) solving the minimization problem 


9J, (Zn) = arg inf {Zn — n"? (0 — 00)} Jn {Zn — 26 — 6)} . (8.6) 


It will be shown that J, converges to the positive definite matrix J. For n large enough, we thus 
have 


05, (Zn) = arg m dist}, eae Jn(@ = 20)} ; 


where dist, (x, y) := Hes — yy In(x — y)}'? is a distance between two points x and y of R?+9+!, 
and where the distance between a point x and a subset S of R?*4*! is defined by disty, (x, S) = 
infses disty, (x, 5). 

We allow 6o to have null components, but we do not consider the (less interesting) case where 
Oo reaches another boundary of ©. More precisely, we assume that 


B1: 6 € (@,@) x [0, 02) x +++ x [0, 9 p+q+1) CO, 


where 0 < w < D and 0 < min{@>,..., @p49+1}. In this case ./n(6), (Zn) — 40) and /n(n — 80) 
belong to the ‘local parameter space’ 


A := |] vn(@ - %) = At X ++: X Agtgtis (8.7) 
where A; = R and, for i = 2,..., p+q + 1, A; = Rif 09; 4 0 and A; = [0, c0) if 4); = 0. With 
the notation 


aA = arg inf {A — Zp}! In {à — Zn}, 
AEA 


we thus have, with probability 1, 


Jn(6j,(Zn) — 80) =A“, forn large enough. (8.8) 
The vector nA is the projection of Z, on A, with respect to the norm Ixl, := x'J,x (see 


Figure 8.1). Since A is closed and convex, such a projection is unique. We will show that 
n! (Â, = Go) = fop (1). (8.9) 


Since (Zn, Jn) tends in law to (Z, J) and AA is a function of (Zn, Jn) which is continuous 
everywhere except at the points where J,, is singular (that is, almost everywhere with respect to 


the distribution of (Z, J) because J is invertible), we have AA $ AA, where A“ is the solution of 
limiting problem 


A^ = arg inf [à = ZY J {A= Z}, Z~N{0, @ = DI}. (8.10) 
E 


\A(a — œp) 
A= Rx[0, ~) 
VN(G — a) 
Vn(© — 6o) 
An 20) 
ap =0 28 VA(® — wo) 
H e H 
—(VNwo — o) \ Jn VA(@ — wo) 

\ 

ji 

\ 

\ 

\ 

\ 

i 

\ 

\ 


Z, 


Figure 8.1 ARCH(1) model with 6) = (%0,0) and © = [w, 0] x [0,@]: /n(@ — 4%) = 
[l-vn(w — w), /n(@ — wo)] x [0, Vn(@Œ — æo)] is the gray area; Zn > N (0, C 


nÔ, — o) and aA have the same asymptotic distribution. 


In addition to B1, we retain most of the assumptions of Theorem 7.2: 
B2: 6 € © and © is a compact set. 


B3: y(Ao) < 0 and for all 0 € ©, Èi Bj <1. 


B4: n? has a nondegenerate distribution with En? = 1. 


B6: k, = En? < oo. 


B5: If p > 0, Aa (z) and Ba (z) do not have common roots, Ag (1) # 0, and aq + Bop # 9. 


We also need the following moment assumption: 
B7: Eef < 00. 


fe) 
When 6) €O, we can show the existence of the information matrix 


1 ðo? 007 
J = Ea $ —— 


without moment assumptions similar to B7. The following example shows that, in the ARCH case, 
this is no longer possible when we allow 6 € 00. 
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Example 8.1 (The existence of J may require a moment of order 4) Consider the ARCH(2) 
model 

Et = ON of = w + doie + a€, (8.11) 
where the true values of the parameters are such that wo > 0, ao, > 0, ao2 = 0, and the distribution 
of the iid sequence (n+) is defined, for a > 1, by 


1 1 
P(n a) = P a) a2’ P@ = 0) = 1 -— 2 


The process (e;) is always stationary, for any value of a, (since exp { — E (log np)} = +00, the 
strict stationarity constraint (2.10) holds true). By contrast, €, does not possess moments of order 
2 when ao; > 1 (see Proposition 2.2). 


We have 
1 ðo? es 
= 1 @) = — 
oj da wo + &01 Ef] 
so that 
2 
1 ðo? g = 
E\— (00); =E | m1 =0} POn-1 = 0) 
Of ða wo + a01€;_1 
1 1 4 
=a (: -— =) E (ef 2) 
because on the one hand n;—ı = 0 entails ¢,_; = 0, and on the other hand n,_; and €;—2 are 


independent. Consequently, if E E: = œ then the matrix J does not exist. 
We then have the following result. 


Theorem 8.1 (QML asymptotic distribution at the boundary) Under assumptions Bl-B7, 
the asymptotic distribution of Jn, —69) is that of à^ satisfying (8.10), where A is given 
by (8.7). 


Remark 8.1 (We retrieve the standard results in O) For o ce, the result is shown in 
Theorem 7.2. Indeed, in this case A = R?+t4+! and 


A^ = Z ~ N10, (n — TY. 
Theorem 8.1 is thus only of interest when 69 is at the boundary 0© of the parameter space. 


Remark 8.2 (The moment condition B7 can sometimes be relaxed) Apart from the ARCH 
(q) case, it is sometimes possible to get rid of the moment assumption B7. Note that under 
the condition y(Ao) < 0, we have o? (60) = bo + ye boje; with boo >0, boj = 0. The 
derivatives 007/36, have the form of similar series. It can be shown that the ratio {807/36}/o? 
admits moments of all orders whenever any term e j which appears in the numerator is also 
present in the denominator. This allows us to show (see the references at the end of the chapter) 
that, in the theorem, assumption B7 can be replaced by 


B7:' boj >0 for all j > 1, where o? (0o) = boo + 2721 boje? j- 
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Note that a sufficient condition for B7’ is a1 > 0 and Bo; > 0 (because boj > ai Be, |). A nec- 
essary condition is obviously that ap; > 0 (because bo; = a@o,). Finally, a necessary and sufficient 
condition for B7’ is 


Jo 
{i | Boj>O} AS and [Jao>O for jo = min{j | oj > 0}. 


i=1 


Obviously, according to Example 8.1, assumption B7’ is not satisfied in the ARCH case. 


8.2.1 Computation of the Asymptotic Distribution 


In this section, we will show how to compute the solutions of (8.10). Switching the components 
of 8, if necessary, it can be assumed without loss of generality that the vector gs? of the first dı 
components of 6 has strictly positive elements and that the vector a”? of the last d2 = p +q + 
1 — dı components of 90 is null. This can be written as 


Rey =Opnis K = (agai Ia). (8.12) 
More generally, it will be useful to consider all the subsets of these constraints. Let 
| See Kma, 


be the set of the matrices obtained by deleting no, one, or several (but not all) rows of K. Note that 
the solution of the constrained minimization problem (8.10) is the unconstrained solution à = Z 
when the latter satisfies the constraint, that is, when 


ZEA=R" x [0,œ0)® = (AER?! | KA > O}. 
When Z ¢ A, the solution à^ coincides with that of an equality constrained problem of the form 


Ax, = argmin(A — Z)'J(A— Z), Ki eK. 
AK; A=0 


An important difference, compared to the initial minimization program (8.10), is that the 
minimization is done here on a vectorial space. The solution is given by a projection (nonorthogonal 
when J is not the identity matrix). We thus obtain (see Exercise 8.1) 


Ax, = PZ, where P; = Ipaga1 — J7'K! (K; J-K) K; (8.13) 
is the projection matrix (orthogonal for the metric defined by J) on the orthogonal subspace of 
the space generated by the rows of K;. Note that Ax, does not necessarily belong to A because 
K;A = 0 does not imply that KA > 0. Let C= {Ax, : K; E€ K and Xx, > 0} be the class of the 
admissible solutions. It follows that the solution that we are looking for is 


Wea Z Up(Z) + Upe(Z) x argmin Q(A) where Q(A) = (A— ZJ (à — Z). 
AEC 


This formula can be used in practice to obtain realizations of A“ from realizations of Z. The 
QO(Ax;) can be obtained by writing 


Q(P;Z) = Z'K! (K; J7 KI)’ K;Z. (8.14) 
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Another expression (of theoretical interest) for 44 is 


242-1 
a^ = Z1p(Z)+ D> PZ 1p,(Z) 


i=l 


where Do = A and the D; form a partition of R?+4+!. Indeed, according to the zone to which Z 
belongs, a solution A“ = A K; is obtained. We will make explicit these formulas in a few examples. 
Let d = p +q + 1, zt = z 1o400)(z) and z7 = z l-%,0 (2). 


Example 8.2 (Law when only one component is at the boundary) When d= 1, that is, 
when only the last component of 9 is zero, we have 


A =R" x [0, 00), K = (0,...,0,1), K={K}, 


and 
A^ = Z M7420) +PZWz,<0, P= la — JK! (KIRK) K. 


We finally obtain 
2^ =Z- Zc, 


where c is the last column of J~! divided by the (d, d)th element of this matrix. Note that the 
last component of à^ is ae. Noting that J~! is, up to a multiplicative factor, the variance of Z, 
it can also be seen that 


Z= yZ 
: E(ZqZ; 
jaa ; of vs 2 a (8.15) 
Zy+q — Yp+4 Za ar(Za 
zt 
d 


Thus A = Z; if and only if Cov(Z;, Za) = 0. 


Example 8.3 (ARCH(2) model when the data generating process is a white noise) Consider 
an ARCH(2) model with 6) = (wo, 0,0). We thus have dy = 2, dı = 1 and 


0 1 0 


A =R x [0, 0), K=(5 i 


). K={Ky, Ko, Ks) 


with Kj = K, Ky = (0, 1,0) and K3 = (0, 0, 1). Exercise 8.6 shows that 


Zi (Ky + lap —W —0 
Z={ Z |~N80,E = (k-11)! = wo 1 0 
Z3 — wo 0 1 


Using KXK' = h and Kj XK; = 1 for i = 2,3 in particular, we thus obtain 


P\Z = (Zi + wo(Z2 + Z3), 0, 0), 
PZ = (Zi + woZ2, 0, Z3)', 


P3Z = (Z; + wZ3, Z2, 0)’. 
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Let Po = J; and Ko = 0. Using (8.14), we have 


Q(P;Z) = (ky — 1)Z'K{K;Z 


0 for i=0O 
B Z + Z: for i=1 
= ty = 1) Ze for i=2 
z? for i=3. 
This shows that 
Zi +0Z, +oZ, 
= Za (8.16) 


Z3 


In order to obtain a slightly simpler expression for the projections defined in (8.13), note that 
the constraint (8.12) can again be written as 


=H, weal % |. (8.17) 
Od, xdi 


We define a dual of K, by 
H= (Hissar Hn}, 


the set of the matrices obtained by deleting from 0 to dz of the last d? columns of the matrix Jy. 
Note that the elements of H can always be numbered in such a way that Hp = Iq corresponds to the 
absence of constraint on 6) and that, fori = 1,..., 2” — 1, the constraint K;,0 = 0 corresponds 
to the constraint 09 = H; H; 0o. Exercise 8.2 then shows that 


P;Z = M; (H!J H) ' H! JZ, (8.18) 
for i = 0, ..., 2% — 1 (with Py = Ia). Note that (8.18) requires the inversion of only one matrix 
of size (d — k) x (d — k) (d — k being the number of columns of H;), whereas (8.13) requires 


the inversion of one matrix of size d x d and another matrix of size k x k. To illustrate this new 
formula, we return to our previous examples. 


Example 8.4 (Example 8.2 continued) We have 


n=( lay ). Raih 


Orxa 
and 


0 


{Jm Mm af Ze 
r=( Jn Mm J: 2al AO }? 
where the matrix Jj; is of size dı x dı, the vectors Jı2 = J}, and Z are of size dı x 1, and J22 
and Z® = Z4 are scalars. We finally obtain 


Si 
A -ù{ -Ja J2 
zz ( i J 


_ a) =i (2) 
PZ = H (H'JH) ' H'JZ = ( ZO + Jņ JZ ). 


using the notation 
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which can be shown using (8.15) and 


HJK (KIOR') = -J3 ha 
Example 8.5 (Example 8.3 continued) We have 


H=| 0], H = (13, A, Ao, H3}, 


with Hı = H, 


1 1 wo wo 
je 2 2 
= ZI 20 Kn > 
o 2 
ao oğ Kn 


which allows us to obtain 


(Zı + wo(Zz + Z3), 0, 0y 
(Zı + @oZ2, 0, Z3) 
(Z, + %23, Z2, 0) 


P;Z = H; (HJ H) H! JZ = 


for 
for 
for 
for 


and to retrieve (8.16). Note that, in this example, the calculation are simpler with (8.13) than with 


(8.18), because J~! has a simpler expression than J. 


8.3 Significance of the GARCH Coefficients 


We make the assumptions of Theorem 8.1 and use the notation of Section 8.2.1. Assume a >0, 


and consider the testing problem 

Ho : a” =0 against Hy): 6 # 0. 
Recall that under Ho, we have 

ÂO E KIA, K = Oana, In), 


where the distribution of A“ is defined by 


AS = arg inf {A— ZY J {A-Z}, Z~N{0, (eq = DI}, 
E 


with A = R4 x [0, œ0)®. 


8.3.1 Tests and Rejection Regions 


(8.19) 


(8.20) 


For parametric assumptions of the form (8.19), the most popular tests are the Wald, score and 


likelihood ratio tests. 
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Wald Statistic 


The Wald test looks at whether 6° is close to 0. The usual Wald statistic is defined by 


Wp, = 16% {KEK |! 62 


nn? 


where Ê is a consistent estimator of © = (Ky — 1)J ae 
Score (or Lagrange Multiplier or Rao) Statistic 


. aw 
On = ni2 
e=( E) 


denote the QMLE of 0 constrained by 0® = 0. The score test aims to determine whether 
al, (8ni2) /00 is not too far from 0, using a statistic of the form 


Let 


~ A 


Aln (On) 4 
R, = - te UUES 
Ryj2—1 30 


əl, (Onj2) 
a 


where &,\2 and Sno denote consistent estimators of «, and J. 


Likelihood Ratio Statistic 


The likelihood ratio test is based on the fact that under Hp : 6 = 0, the constrained (quasi) 
log-likelihood log Ly, (6,)2) = —(/2)ln (8n)2) should not be much smaller than the unconstrained 
log-likelihood —(n/2)l, (ôn). The test employs the statistic 


L, =n {In (6,2) a i, (6,.)} : 
Usual Rejection Regions 


From the practical viewpoint, the score statistic presents the advantage of only requiring constrained 
estimation, which is sometimes much simpler than the unconstrained estimation required by the 
two other tests. The likelihood ratio statistic does not require estimation of the information matrix 
J, nor the kurtosis coefficient «,. For each test, it is clear that the null hypothesis must be rejected 
for large values of the statistic. For standard statistical problems, the three statistics asymptotically 
follow the same Xa distribution under the null. At the asymptotic level œ, the standard rejection 
regions are thus 


{Wn >xg,.-)}, (R> xaa), {Ln > xa, (l -—@)} 
where xa, (1 — æ) is the (1 — @)-quantile of the x? distribution with dz degrees of freedom. In 
the case d? = 1, for testing the significance of only one coefficient, the most widely used test is 
Student’s ¢ test, defined by the rejection region 


{ltal > DTTA — @/2)}, (8.21) 


where t, = Jno { K=K’ a s This test is equivalent to the standard Wald test because t, = 
J/W,, (t, being here always positive or null, because of the positivity constraints of the QML 
estimates) and 


{87a —a@/2)}’ = x2 a). 
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Our testing problem is not standard because, by Theorem 8.1, the asymptotic distribution of 6, 
is not normal. We will see that, among the previous rejection regions, only that of the score test 
asymptotically has the level a. 


8.3.2 Modification of the Standard Tests 


The following proposition shows that for the Wald and likelihood ratio tests, the asymptotic dis- 
tribution is not the usual xd, under the null hypothesis. The proposition also shows that the 
asymptotic distribution of the score test remains the xa, distribution. The asymptotic distribution 
of R, is not affected by the fact that, under the null hypothesis, the parameter is at the boundary 
of the parameter space. These results are not very surprising. Take the example of an ARCH(1) 
with the hypothesis Hp : a = 0 of absence of ARCH effect. As illustrated by Figure 8.2, there 
is a nonzero probability that @ be at the boundary, that is, that @ = 0. Consequently W,, = na? 
admits a mass at 0 and does not follow, even asymptotically, the x? law. The same conclusion 
can be drawn for the likelihood ratio test. On the contrary, the score n'/2 3l, (80) /00 can take as 
well positive or negative values, and does not seem to have a specific behavior when 6 is at 
the boundary. 


Proposition 8.3 (Asymptotic distribution of the three statistics under Hy) Under Ho and the 
assumptions of Theorem 8.1, 


W, E WANs, (8.22) 

R, 5 R~ x3, (8.23) 
l 1 _ 

LSLE -0g e {KI'K' IKZ 


if i7 2 
=—-]} inf |Z -Al — = 24 
TEA All — int. IZ ai}, (8.24) 


where Q = K' { (k — DKJIK'Y' K and ì^ satisfies (8.20). 


Q> 


& =a =0 ay =0 


Figure 8.2 Concentrated log-likelihood (solid line) a > log L,(@, œ) for an ARCH(1) model. 
Assume there is no ARCH effect: the true value of the ARCH parameter is wp = 0. In the configu- 
ration on the right, the likelihood maximum does not lie at the boundary and the three statistics W,,, 
R, and L, take strictly positive values. In the configuration on the left, we have W,, = nâ? = 0 and 
L, = 2 {log L, (ô, â) — log L, (ô, 0)} = 0, whereas R, = {0 log L, (ô, 0)/da}? continues to take a 
strictly positive value. 
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Remark 8.3 (Equivalence of the statistics W, and L,) Let «&, be an estimator which 
converges in probability to «,. We can show that 


2 
W, = Fi L, + op(1) 
ky —1 

under the null hypothesis. The Wald and likelihood ratio tests will thus have the same asymptotic 
critical values, and will have the same local asymptotic powers (see Exercises 8.8 and 8.9, and 


Section 8.3.4). They may, however, have different asymptotic behaviors under nonlocal alternatives. 


Remark 8.4 (Assumptions on the tests) In order for the Wald statistic to be well defined, Q 
must exist, that is, J = J (0o) must exist and must be invertible. This is not the case, in particular, 
for a GARCH(p, q) at 69 = (wo, 0,...,0), when p ¥ 0. It is thus impossible to carry out a Wald 
test on the simultaneous nullity of all the œ; and £; coefficients in a GARCH(p, 4), p # 0. The 
assumptions of Theorem 8.1 are actually required, in particular the identifiability assumptions. It 
is thus impossible to test, for instance, an ARCH(1) against a GARCH(2, 1), but we can test, for 
instance, an ARCH(1) against an ARCH(3). 

A priori, the asymptotic distributions W and L depend on J, and thus on nuisance parameters. 
We will consider two particular cases: the case where we test the nullity of only one GARCH 
coefficient and the case where we test the nullity of all the coefficients of an ARCH. In the 
two cases the asymptotic laws of the test statistics are simpler and do not depend on nuisance 
parameters. In the second case, both the test statistics and their asymptotic laws are simplified. 


8.3.3 Test for the Nullity of One Coefficient 


Consider the case d = 1, which is perhaps the most interesting case and corresponds to testing 
the nullity of only one coefficient. In view of (8.15), the last component of A“ is equal to Z7. We 
thus have 

(KAS)? (KZ? 


Pad 
w, 5W= = 
ae KEK VarKZ 


Tx z>0) = (Z*)’ Wz->0) 


where Z* ~ N(0, 1). Using the symmetry of the Gaussian distribution, and the independence 
between Z*? and z+ +0} when Z* follows the real normal law, we obtain 


1 1 
W, = W~ 750 + 5 x? (where 59 denotes the Dirac measure at 0). 
Testing 

Ho: gA :=0(p+4+1)=0 against Hı :A(p+q+1)>0, 


can thus be achieved by using the critical region {W, > xa —2a)} at the asymptotic level 
a < 1/2. In view of Remark 8.3, we can define a modified likelihood ratio test of critical region 
{2L,/(R — 1) > x7(. — 2a)}. Note that the standard Wald test {W,, > x? (1 — «)} has the asymp- 
totic level œ/2, and that the asymptotic level of the standard likelihood ratio test { L> xa — a)} 
is much larger than œ when the kurtosis coefficient «, is large. A modified version of the Student 
t test is defined by the rejection region 


{tn > ®-'(1—a)}, (8.25) 
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We observe that commercial software — such as GAUSS, R, RATS, SAS and SPSS — do not use 
the modified version (8.25), but the standard version (8.21). This standard test is not of asymptotic 
level œ but only w/2. To obtain a ¢ test of asymptotic level œ it then suffices to use a test of 
nominal level 2a. 


Example 8.6 (Empirical behavior of the tests under the null) We simulated 5000 indepen- 
dent samples of length n = 100 and n = 1000 of a strong (0, 1) white noise. On each realization 
we fitted an ARCH(1) model €; = { + ae?_,}'/2, by QML, and carried out tests of Ho : a = 0 
against Hı :a>0. 

We began with the modified Wald test with rejection region 


{W, = nâ? > x? (0.90) = 2.71}. 


This test is of asymptotic level 5%. For the sample size n = 100, we observed a relative rejection 
frequency of 6.22%. For n = 1000, we observe a relative rejection frequency of 5.38%, which 
is not significantly different from the theoretical 5%. Indeed, an elementary calculation shows 
that, on 5000 independent replications of a same experiment with success probability 5%, the 
success percentage should vary between 4.4% and 5.6% with a probability of approximately 95%. 
Figure 8.3 shows that the empirical distribution of W, is quite close to the asymptotic distribution 
69/2 + x, even for the small sample size n = 100. 
We then carried out the score test defined by the rejection region 


{R, = nR? > x7 (0.95) = 3.84}, 


where R° is the determination coefficient of the regression of e? on 1 and €? ,. This test is also 
of asymptotic level 5%. For the sample size n = 100, we observed a relative rejection frequency 
of 3.40%. For n = 1000, we observed a relative frequency of 4.32%. 

We also used the modified likelihood ratio test. For the sample size n = 100, we observed a 
relative rejection frequency of 3.20%, and for n = 1000 we observed 4.14%. 

On these simulation experiments, the Type I error is thus slightly better controlled by the 
modified Wald test than by the score and modified likelihood ratio tests. 


Example 8.7 (Comparison of the tests under the alternative hypothesis) We implemented 
the W,, R, and L, tests of the null hypothesis Hp: ao; =0 in the ARCH(1) model 


0.2 0.25 
0.15 Ñ 0.2 
E 0.15 
0.15 
N 0.1 
0.05 F 0.05 
2 4 6 8 10 12 14 2 4 6 8 10 12 14 


Figure 8.3 Comparison between a kernel density estimator of the Wald statistic (dotted line) 
and the x? /2 density on [0.5, co) (solid line) on 5000 simulations of an ARCH(1) process with 
Qo, = 0: (left) for sample size n = 100; (right) for n = 1000. 
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% reject % reject 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 


Doitiivitiiptaiitiritiritiiitin g por tipi tiriitiritiiitiiitiriti Q 


0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.02 0.04 0.06 0.08 0.1 0.12 0.14 


Figure 8.4 Comparison of the observed powers of the Wald test (thick line), of the score test 
(dotted line) and of the likelihood ratio test (thin line), as function of the nominal level a, on 5000 
simulations of an ARCH(1) process: (left) for n = 100 and ao; = 0.2; (right) for n = 1000 and 
ao, = 0.05. 


E& = {o + aoe? } m, n: ~ N(O, 1). Figure 8.4 compares the observed powers of the three 
tests, that is, the relative frequency of rejection of the hypothesis Hp that there is no ARCH 
effect, on 5000 independent realizations of length n = 100 and n = 1000 of an ARCH(1) model 
with a; = 0.2 when n = 100, and ap; = 0.05 when n = 1000. On these simulated series, the 
modified Wald test turns out to be the most powerful. 


8.3.4 Conditional Homoscedasticity Tests with ARCH Models 


Another interesting case is that obtained with dı = 1, 6“) =w, p =0 and dy = q. This case 
corresponds to the test of the conditional homoscedasticity null hypothesis 


Ho : 91 = --- = og = 0 (8.26) 
in an ARCH(q) model 


[s =o, (m) iid (0, 1) (8.27) 


2 q 2 
of =W +} iz AiE» o>0, a 20. 


We will see that for testing (8.26) there exist very simple forms of the Wald and score statistics. 


Simplified Form of the Wald Statistic 


Using Exercise 8.6, we have 


E (60) = (ky — DJ) | = 
=@ 


Since KXK’ = Ig, we obtain a very simple form for the Wald statistic: 


N 


q 
W, =n) â}. (8.28) 
i=1 
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Asymptotic Distribution W and L 


A trivial extension of Example 8.3 yields 


Zito Xi- Zi 
A 73 
jie (8.29) 


Zi 


The asymptotic distribution of n ae a? is thus that of 


where the Z; are independent /V(0, 1). Thus, in the case where an ARCH(q) is fitted to a white 
noise we have 


1 | 
W, =n) 1S o+ choy? (8.30) 


This asymptotic distribution is tabulated and the critical values are given in Table 8.2. In view 
of Remark 8.3, Table 8.2 also yields the asymptotic critical values of the modified likelihood 
ratio statistic 2L,,/(&, — 1). Table 8.3 shows that the use of the standard xa — a)-based critical 
values of the Wald test would lead to large discrepancies between the asymptotic levels and the 
nominal level a. 


Table 8.2 Asymptotic critical value cya, at level œ, of the Wald test of rejection region {n X4; 


a? > Cq,a} for the conditional homoscedasticity hypothesis Ho : a; = --- = a, = 0 in an ARCH 
(q) model. 
a (%) 

q 0.1 1 29 5 10 15 

1 9.5493 5.4119 3.8414 2.7055 1.6424 1.0742 
2 11.7625 7.2895 5.5369 4.2306 2.9524 2.2260 
3 13.4740 8.7464 6.8610 5.4345 4.0102 3.1802 
4 14.9619 10.0186 8.0230 6.4979 4.9553 4.0428 
5 16.3168 11.1828 9.0906 7.4197 5.8351 4.8519 


Table 8.3 Exact asymptotic level (%) of erroneous Wald tests, of rejection region {n Ti 


a? > xa —a)}, under the conditional homoscedasticity assumption Ho : a; =--- =a, = 0 in 
an ARCH(q) model. 


a (%) 
q 0.1 1 2.5 5 10 15 
1 0.05 0.5 1.25 25 5 75 
2 0.04 0.4 0.96 1.97 4.09 6.32 
3 0.02 0.28 0.75 1.57 3.36 5.29 
4 0.02 0.22 0.59 1.28 2.79 4.47 
5 0.01 0.17 0.48 1.05 2.34 3.81 
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Score Test 


For the hypothesis (8.26) that all the a coefficients of an ARCH(q) model are equal to zero, the 
score statistic R,, can be simplified. To work within the linear regression framework, write 


TOO 
=a XY, 


where Y is the vector of length n of the ‘dependent’ variable 1 — e /@, where X is then x (q + 1) 
matrix of the constant ÂT! (in the first column) and of the ‘explanatory’ variables eo! (in 
column i+ 1, with the convention e; = 0 for t < 0), and ô = che =H S e?. Estimating 


J (00) by n™!X'X, and ky — 1 by n7!Y’Y, we obtain 


YX XS) YY 
pee ee EE 


R, =F VY 


’ 


and one recognizes n times the coefficient of determination in the linear regression of Y on the 
columns of X. Since this coefficient is not changed by linear transformation of the variables (see 
Exercise 5.11), we simply have R, = nR2, where R? is the coefficient of determination in the 
regression of € on a constant and q lagged values Eis bris Eg Under the null hypothesis of 
conditional homoscedasticity, Rọ, asymptotically follows the X law. 

The previous simple forms of the Wald and score tests are obtained with estimators of J which 
exploit the particular form of the matrix under the null. Note that there exist other versions of these 
tests, obtained with other consistent estimators of J. The different versions are equivalent under 
the null, but can have different asymptotic behaviors under the alternative. 


8.3.5 Asymptotic Comparison of the Tests 


The Wald and score tests that we have just defined are in general consistent, that is, their powers 
converge to 1 when they are applied to a wide class of conditionally heteroscedastic processes. An 
asymptotic study will be conducted via two different approaches: Bahadur’s approach compares the 
rates of convergence to zero of the p-values under fixed alternatives, whereas Pitman’s approach 
compares the asymptotic powers under a sequence of local alternatives, that is, a sequence of 
alternatives tending to the null as the sample size increases. 


Bahadur’s Approach 


Let Sw(t) = P(W >t) and SrRr(t) = P(R > t) be the asymptotic survival functions of the two test 
statistics, under the null hypothesis Hg defined by (8.26). Consider, for instance, the Wald test. 
Under the alternative of an ARCH(q) which does not satisfy Ho, the p-value of the Wald test 
Sw(W,,) converges almost surely to zero as n — oo because 


n 


q 
Mis Sia? £0. 
i=1 


The p-value of a test is typically equivalent to exp{—nc/2}, where c is a positive constant called 
the Bahadur slope. Using the fact that 


log Sw(x) ~ log P(xg >x) x > 00, (8.31) 
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and that lim,_,., log P(x? >x) ~ —x/2, the (approximate!) Bahadur slope of the Wald test 
is thus 


S L a 
m log Sw(W,) = 2 , as. 
i= 
To compute the Bahadur slope of the score test, note that we have the linear regression model 
e =w + A(B)e? + v, where v; = (n? — 1)o? (9) is the linear innovation of č: We then have 
R, _ Var(v,) 


—— = R21 S 
n Var (e?) 


The previous limit is thus equal to the Bahadur slope of the score test. The comparison of the two 
slopes favors the score test over the Wald test. 


Proposition 8.4 Let (€,) be a strictly stationary and nonanticipative solution of the ARCH(q) 
model (8.27), with E(e}) < œ and Ai doi > 0. The score test is considered as more efficient than 
the Wald test in Bahadur’s sense because its slope is always greater or equal to that of the Wald 
test, with equality when q = 1. 


Example 8.8 (Slopes in the ARCH(1) and ARCH(2) cases) The slopes are the same in the 


ARCH(1) case because 
. n Rn 3 
lim — = lim — = qi. 
n>œ n n>œ n 


In the ARCH(2) case with fourth-order moment, we have 


_ W 5 . R 
lim — =? +02, lim ~ =a? +07 + i 
n>œ n n>oo n 1— a2 


ara 


We see that the second limit is always larger than the first. Consequently, in Bahadur’s sense, the 
Wald and Rao tests have the same asymptotic efficiency in the ARCH(1) case. In the ARCH(2) 
case, the score test is, still in Bahadur’s sense, asymptotically more efficient than the Wald test for 
testing the conditional homoscedasticity (that is, œ~ = a2 = 0). 


Bahadur’s approach is sometimes criticized for not taking account of the critical value of test, and 
thus for not really comparing the powers. This approach only takes into account the (asymptotic) 
distribution of the statistic under the null and the rate of divergence of the statistic under the alter- 
native. It is unable to distinguish a two-sided test from its one-sided counterpart (see Exercise 8.8). 
In this sense the result of Proposition 8.4 must be put into perspective. 


Pitman’s Approach 


In the ARCH(1) case, consider a sequence of local alternatives H, (T) : a} = t/./n. We can show 
that under this sequence of alternatives, 


L 
W, =ndj > (U +1)? ltu+r>0) U ~ MO, 1). 
Consequently, the local asymptotic power of the Wald test is 
P(U+t>c)=1- (c -T), cı = 7! (1 — a). (8.32) 


! The term ‘approximate’ is used by Bahadur (1960) to emphasize that the exact survival function Sw, (t) 
is approximated by the asymptotic survival function Sw(t). See also Bahadur (1967) for a discussion on the 
exact and approximate slopes. 
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The score test has the local asymptotic power 


P{(U +t) > c$) = 1- 8a- t) + Ccoa- 7), c= (1 — a/2) (8.33) 


Note that (8.32) is the power of the test of the assumption Hp : 0 = O against the assumption 
Hı : 0 = t >Q, based on the rejection region of {X > cı} with only one observation X ~ M(0, 1). 
The power (8.33) is that of the two-sided test {|X| > c2}. The tests {X > cı} and {|X| > c2} have the 
same level, but the first test is uniformly more powerful than the second (by the Neyman—Pearson 
lemma, {X > cı} is even uniformly more powerful than any test of level less than or equal to a, 
for any one-sided alternative of the form Hı). The local asymptotic power of the Wald test is thus 
uniformly strictly greater than that of Rao’s test for testing for conditional homoscedasticity in an 
ARCH(1) model. 

Consider the ARCH(2) case, and a sequence of local alternatives H, (T) : a1 = T1 /Jn, a= 
T/./n. Under this sequence of alternatives 


ae BOE 
Wẹ, =n (âf +8?) 5 (Ui + 11)? Woy er, 5 0) HU? + 2)? lute > 0) 


with (U1, U2)’ ~ N(O, h). Let cı be the critical value of the Wald test of level œ. The local 
asymptotic power of the Wald test is 


P (Ui +11 > y€) P U +1 < 0) 


+f P {(U1 + 11) Wu, +e +0) > C1 — (& + n) } G(x) dx 


T2 


= {1- (Va — t1)} P(r) + 1- (=r + Ver) 
Snt 
+f ‘{i-@(Va =(@ +n —n)| o@dx 


17) 


Let cz be the critical value of the Rao test of level æ. The local asymptotic power of the Rao test is 
P {U1 +0)? + U1 +0) >c}, 


where (U; + t)? + (U2 + T)? follows a noncentral x? distribution, with two degrees of freedom 
and noncentrality parameter A + a Figure 8.5 compares the powers of the two tests when tT; = T2. 


s9 
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Figure 8.5 Local asymptotic power of the Wald test (solid line) and of the Rao test (dotted line) 
for testing for conditional homoscedasticity in an ARCH(2) model. 
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Thus, the comparison of the local asymptotic powers clearly favors the Wald test over the 
score test, counterbalancing the result of Proposition 8.4. 


8.4 Diagnostic Checking with Portmanteau Tests 


To check the adequacy of a given time series model, for instance an ARMA(p,q) model, it is 
common practice to test the significance of the residual autocorrelations. In the GARCH framework 
this approach is not relevant because the process fjr = €;/6; is always a white noise (possibly a 
martingale difference) even when the volatility is misspecified, that is, when e; = /h;7; with 
hy 6? . To check the adequacy of a volatility model, for instance a GARCH(p, q) of the form 
(7.1), it is much more fruitful to look at the squared residual autocovariances 


— 2 
PM=- $ ADAD =a HS, 


n 
t=|h|+1 


where |A| < n, 6; = õ (ĝa), & is defined by (7.4) and 6, is the QMLE given by (7.9). 
For any fixed integer m, 1 < m < n, consider the statistic Êm = (F(1),...,#(m))’. Let &, and 
J be weakly consistent estimators of «, and J. For instance, one can take 


F i a Hee 1 0670,) 8676,) 
Ky = — J — aT; J=- > x í 
n = 54 6,) n &4(6,) 30 00’ 


Define also the m x (p +q + 1) matrix Cr whose (h, k)th element, for 1 < h < m and 1 < k < 
p+4q+ l1, is given by 


P 1 a626n) 
Cm (h, k) = B D (Ae h` ETTEN 
t Eny PÊ) 2% 


Theorem 8.2 (Asymptotic distribution of a portmanteau test statistic) Under the assump- 
tions of Theorem 7.2 ensuring the consistency and asymptotic normality of the QMLE, 


£ 2 
nf, Pn Eá Xm 


with D = (êy — 1)? Im — (Ry — DEmI71C!,. 


The adequacy of the GARCH(p,q) model is rejected at the asymptotic level œ when 


[nf D! Fin > xa- a)}. 


8.5 Application: Is the GARCH(1,1) Model 
Overrepresented? 


The GARCH(1,1) model is by far the most widely used by practitioners who wish to estimate 
the volatility of daily returns. In general, this model is chosen a priori, without implementing 
any Statistical identification procedure. This practice is motivated by the common belief that the 
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GARCH(1,1) (or its simplest asymmetric extensions) is sufficient to capture the properties of 
financial series and that higher-order models may be unnecessarily complicated. 

We will show that, for a large number of series, this practice is not always statistically justified. 
We consider daily and weekly series of 11 returns (CAC, DAX, DJA, DJI, DJT, DJU, FTSE, 
Nasdaq, Nikkei, SMI and S&P 500) and five exchange rates. The observations cover the period 
from January 2, 1990 to January 22, 2009 for the daily returns and exchange rates, and from 
January 2, 1990 to January 20, 2009 for the weekly returns (except for the indices for which the 
first observations are after 1990). We begin with the portmanteau tests defined in Section 8.4. 
Table 8.4 shows that the ARCH models (even with large order q) are generally rejected, whereas 
the GARCH(1,1) is only occasionally rejected. This table only concerns the daily returns, but 
similar conclusions hold for the weekly returns and exchange rates. The portmanteau tests are 
known to be omnibus tests, powerful for a broad spectrum of alternatives. As we will now see, 
for the specific alternatives for which they are built, the tests defined in Section 8.3 (Wald, score 
and likelihood ratio) may be much more powerful. 

The GARCH(1,1) model is chosen as the benchmark model, and is successively tested against 
the GARCH(1,2), GARCH(1,3), GARCH(1,4) and GARCH(2,1) models. In each case, the three 
tests (Wald, score and likelihood ratio) are applied. The empirical p-values are displayed in 
Table 8.5. This table shows that: (1) the results of the tests strongly depend on the alternative; 


Table 8.4 Portmanteau test p-values for adequacy of the ARCH(5) and GARCH(1,1) models 
for daily returns of stock market indices, based on m squared residual autocovariances. p-values 
less than 5% are in bold, those less than 1% are underlined. 


m 


1 2 3 4 5 6 7 8 9 10 11 12 


Portmanteau tests for adequacy of the ARCH(5) 

CAC 0.194 0.010 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
DAX 0.506 0.157 0.140 0.049 0.044 0.061 0.080 0.119 0.140 0.196 0.185 0.237 
DJA 0.441 0.34 0.139 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 


DJI 0.451 0.374 0.015 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 


SMI 0.502 0.692 0.407 0.370 0.211 0.264 0.351 0.374 0.463 0.533 0.623 0.700 
S&P 500 0.647 0.540 0.012 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 


Portmanteau tests for adequacy of the GARCH(1,1) 
CAC 0.312 0.379 0.523 0.229 0.301 0.396 0.495 0.578 0.672 0.660 0.704 0.743 


SMI 0.586 0.758 0.908 0.959 0.986 0.995 0.996 0.999 0.999 0.999 0.999 0.999 
S&P 500 0.598 0.364 0.528 0.643 0.673 0.394 0.512 0.535 0.639 0.432 0.496 0.594 
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Table 8.5 p-values for tests of the null of a GARCH(1,1) model against the GARCH(1,2), 
GARCH(1,3), GARCH(1,4) and GARCH(2,1) alternatives, for returns of stock market indices 
and exchange rates. p-values less than 5% are in bold, those less than 1% are underlined. 


Alternative 
GARCH(1,2) GARCH(1,3) GARCH(1,4) GARCH(2,1) 
W, Rp Ly W, Ry Ly Wa Ry La Wa R, Ln 


Daily returns of indices 

CAC 0.007 0.033 0.013 0.005 0.000 0.001 0.024 0.188 0.040 0.500 0.280 0.500 
DAX 0.002 0.001 0.003 0.001 0.000 0.000 0.001 0.162 0.014 0.350 0.031 0.143 
DJA 0.158 0.337 0.166 0.259 0.285 0.269 0.081 0.134 0.064 0.500 0.189 0.500 
DJI 0.044 0.100 0.049 0.088 0.071 0.094 0.107 0.143 0.114 0.500 0.012 0.500 
DJT 0.469 0.942 0.470 0.648 0.009 0.648 0.519 0.116 0.517 0.369 0.261 0.262 
DJU 0.500 0.000 0.500 0.643 0.000 0.643 0.725 0.001 0.725 0.017 0.000 0.005 
FTSE 0.080 0.122 0.071 0.093 0.223 0.083 0.213 0.423 0.205 0.458 0.843 0.442 
Nasdaq 0.469 0.922 0.468 0.579 0.983 0.578 0.683 0.995 0.702 0.500 0.928 0.500 
Nikkei 0.004 0.002 0.004 0.042 0.332 0.081 0.052 0.526 0.108 0.238 0.000 0.027 
SMI 0.224 0.530 0.245 0.058 0.202 0.063 0.086 0.431 0.108 0.500 0.932 0.500 
SP 500 0.053 0.079 0.047 0.089 0.035 0.078 0.055 0.052 0.043 0.500 0.045 0.500 
Weekly returns of indices 

CAC 0.017 0.143 0.049 0.028 0.272 0.068 0.061 0.478 0.142 0.500 0.575 0.500 
DAX 0.154 0.000 0.004 0.674 0.798 0.674 0.667 0.892 0.661 0.043 0.000 0.000 
DJA 0.194 0.001 0.052 0.692 0.607 0.692 0.679 0.899 0.597 0.003 0.000 0.000 
DJI 0.173 0.000 0.030 0.682 0.482 0.682 0.788 0.358 0.788 0.000 0.000 0.000 
DJT 0.428 0.623 0.385 0.628 0.456 0.628 0.693 0.552 0.693 0.002 0.000 0.004 
DJU 0.500 0.747 0.500 0.646 0.011 0.646 0.747 0.038 0.747 0.071 0.003 0.017 
FTSE 0.188 0.484 0.222 0.183 0.534 0.214 0.242 0.472 0.272 0.500 0.532 0.500 
Nasdaq 0.441 0.905 0.448 0.387 0.868 0.412 0.199 0.927 0.266 0.069 0.961 0.344 
Nikkei 0.500 0.140 0.500 0.310 0.154 0.260 0.330 0.316 0.462 0.030 0.138 0.053 
SMI 0.500 0.720 0.500 0.217 0.144 0.150 0.796 0.754 0.796 0.314 0.769 0.360 
SP 500 0.117 0.000 0.001 0.659 0.114 0.659 0.724 0.051 0.724 0.000 0.000 0.000 
Daily exchange rates 

$/€ 0.452 0.904 0.452 0.194 0.423 0.181 0.066 0.000 0.015 0.500 0.002 0.500 
¥/€ 0.037 0.000 0.002 0.616 0.090 0.618 0.304 0.000 0.227 0.136 0.000 0.000 
£/€ 0.439 0.879 0.440 0.471 0.905 0.464 0.677 0.981 0.677 0.258 0.493 0.248 
CHF/€ 0.141 0.000 0.012 0.641 0.152 0.641 0.520 0.154 0.562 0.012 0.000 0.000 
CHE 0.500 0.268 0.500 0.631 0.714 0.631 0.032 0.000 0.002 0.045 0.045 0.029 


(ii) the p-values of the three tests can be quite different; (iii) for most of the series, the GARCH(1,1) 
model is clearly rejected. Point (ii) is not surprising because the asymptotic equivalence between 
the three tests is only shown under the null hypothesis or under local alternatives. Moreover, 
because of the positivity constraints, it is possible (see, for instance, the DJU) that the estimated 
GARCH(1,2) model satisfies & = 0 with dl, (6n\2) /da2 <0. In this case, when the estimators 
lie at the boundary of the parameter space and the score is strongly positive, the Wald and LR 
tests do not reject the GARCH(1,1) model, whereas the score does reject it. In other situations, 
the Wald or LR test rejects the GARCH(1,1) whereas the score does not (see, for instance, the 
DAX for the GARCH(1,4) alternative). This study shows that it is often relevant to employ several 
tests and several alternatives. The conservative approach of Bonferroni (rejecting if the minimal 
p-value multiplied by the number of tests is less than a given level a), leads to rejection of the 
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GARCH(1,1) model for 16 out of the 24 series in Table 8.5. Other procedures, less conservative 
than that of Bonferroni, could also be applied (see Wright, 1992) without changing the general 
conclusion. 

In conclusion, this study shows that the GARCH(1,1) model is certainly overrepresented in 
empirical studies. The tests presented in this chapter are easily implemented and lead to selection 
of GARCH models that are more elaborate than the GARCH(1,1). 


8.6 Proofs of the Main Results 


Proof of Theorem 8.1 


We will split the proof into seven parts. 


(a) Asymptotic normality of score vector. When 6o € 00, the function or (0) can take nega- 
tive values in a neighborhood of 69, and £;(0) = e /o2(0) + log o? (0) is then undefined in this 
neighborhood. Thus the derivative of £;(-) does not exist at 6). By contrast the right derivatives 
exist, and the vector 02;(9)/00 of the right partial derivatives is written as an ordinary derivative. 
The same convention is used for the higher-order derivatives, as well as for the right derivatives 
of l„, 2; and |, at 0o. With these conventions, the formulas for the derivative of criterion remain 


valid: 
060) _ f e? 1 do? 
aa or J lo? 30 J’ 
a7 6,(0 P| f1 2o? A L a 1 do? 
PA h E is ee DATEER al ral ga 
3090" o2 J |o2 0000" o? 2 a0 J lo? 00’ 


It is then easy to see that J = E 7 €, (00) /0006' exists under the moment assumption B7. The 
ergodic theorem immediately yields 


Jn > J, almost surely, (8.35) 


where J is invertible, by assumptions B4 and B5 (cf. Proof of Theorem 7.2). The conver- 
gence (8.5) then directly follows from Slutsky’s lemma and the central limit theorem given in 
Corollary A.1. 


(b) Uniform integrability and continuity of the second-order derivatives. It will be shown that, 
for all ¢ > 0, there exists a neighborhood V(69) of 6o such that, almost surely, 


376, (0) 
Eo 7 (8.36) 
aevo no || 0008 
and 
lx 020,(0) 3260 
lim — (9) 0° €; (60) z bai 
n= 4 serene 0000" 3000" 


Using elementary derivative calculations and the compactness of ©, it can be seen that 


3o? (0) 
06; 


0707 (0) 
36; 30; 


= bË (0) + wa (@)e?_, and = bf?) + 2 POJE? p, 


k=1 
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with 
(i) k : Gj) k 
sup |b} (9)| < Kp", sup |b} (0)| < Kp", 


dEO dEO 


where K > 0 and 0 < p < 1. Since supg-@ 1/o7(0) < K, assumption B7 then entails that 


1 3o? (0) 1 3?0? (0) e2 
sup — <œ, ||sup ==] <œ, |sup— || <o. 
co 0; 30 İz oco of 0090" ||, 6c0 of lla 


In view of (8.34), the Hölder and Minkowski inequalities then show (8.36) for all neighborhood 
of 6o. The ergodic theorem entails that 


1 n 
lim — sup 
n> N T] BEV(AING 


PLO) 3L (00) 


3090" 9090" 


L0) 3L (00) 
9000" 3030" 


%@ SU 

0EV(%|)NO 
This expectation decreases to 0 when the neighborhood (69) decreases to the singleton {09}, which 
shows (8.37). 


(c) Convergence in probability of 6;,(Z,) to 6) at rate y/n. In view of (8.35), for n large enough, 
Ixy, = (x Jax) defines a norm. The definition (8.6) of 0j, (Zn) entails that || Vn (03, (Zn) — 
60) — Znllu, < WZnlly,- The triangular inequality then implies that 


nO, (Zn) Fs Ao) Ilan = ny, (Zn) = 0) cat Zn F Zall, 


< IVa (O, (Zn) a 6o) = Zall Jn + WZnll a, 
< 2 (Z) JnZn) ? = Op(1), 


where the last equality comes from the convergence in law of (Zn, Jn) to (Z, J). This entails that 
93,(Zn) — % = Op(n-"?), 


(d) Quadratic approximation of the objective function. A Taylor expansion yields 


3 : al, (0. 1 , | aio) 
1,(0) = la (80) + Ga (0 — 0o) + = (8 — 4) | (8 — 0o) 


leu 2 0000! 
x al, (89) 1 , T d71, (0) 
= În (o) + TA (6 — ) + 5 (0 — 6) | aa |e- Re 


where 


0000" 0000" 


1 a1, (6%) a1, (6 
Ru(0) = 50 -e | ae a 


| (8 — 00) 


and OF is between 6 and 0o. Note that (8.37) implies that, for any sequence (6,) such that 6, — 090 = 
Op(1), we have R,(,) = op (| — ll"). In particular, in view of (c), we have R, {67,(Zn)} = 
op(n—!). Introducing the vector Z„ defined by (8.3), we can write 


An (90) 
a0’ 


1 
(0 — bo) = —-Z), Javn (0 — 6) 
n 
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and 
3 M 1 1 
1,() = 1, (80) = —— Z’ InJn(0 = 0) = —vn(0 — 60)! InZn 
2n 2n 
1 
+5@ — 00)’ In(@ — 80) + Rn (0) + R3 (0) 
LIZ, — Jao — olk 
E n n 
2n K 
1 
— 5 Zn Jn Zn F Rn (8) + R7(8), (8.38) 
n 
where 


3i, (00) 
0000’ 


Al, (0) _ An Go) 
00 30 


1 
0) = | | @ —6) + 500 ay | - Jn} @ =e. 
The initial conditions are asymptotically negligible, even when the parameter stays at the boundary. 
Result (d) of page 160 remaining valid, we have R*(6,) = op (a= 210, = Poll) for any sequence 


(6,) such that 6, — 9 in probability. 


(e) Convergence in probability of 6, to 6) at rate n'/?. We know that 
Vin — 9) = Op(1) 


when 69 cO. We will show that this result remains valid when 6) € 0©. Theorem 7.1 applies. 
In view of (d), the almost sure convergence of Ê, to 0 and of J, to the nonsingular matrix J, 
we have 

2 

in) 


Run) = oP (ô, — eol?) = or (ê, — e 


A ` 


and 
R*(6n) = Oop (a? [ô — Oo 


Since Î„(-) is minimized at 6,, we have 


Inn) -L6 = = {Zn — VAG — 403), — IZ 
top (vin ~ 60)||;,) +” (| vm, — 40), )} 
<0. 
It follows that 
| Zn — Vren — 60)|3, < IZÈ, + or (v7, — 4), ) 
tor (Iva, — 4017, 


<[IZnlly + or (156 — |, > 
where the last inequality follows from ||Z,, || ,, = Op (1). The triangular inequality then yields 
Iva Â, — ols, < Il Gn — 90) — Zn lly, + Zalla 
< 2lZnlls, + op (|76, — 6)] ,,) 


Thus ||. /7(On — o)l, {1 + op} < 211Znlly, = Op (1). 
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(f) Approximation of ||Z, — n (ô, — 9) ||, by ||Zn — A) l3,- We have 


0< | Zn vn (Ôn 60) | A | Zn Jn, (Zn) = 60) | : 
= 2n fin Ân) — In (0), (Zn))} — 2n {(Ry + RÂ) — (Rn + RI On (Zn))} 
< —2n {(Rn + R*)(6n) — (Rn + R¥) (03, (Zn))} = 0p (1), 


where the first line comes from the definition of 03, (Zn), the second line comes from (8.38), and 
the inequality in third line follows from the fact that 1, (-) is minimized at 6,, the final equality 
having been shown in (d). In view of (8.8), we conclude that 


A 2 2 
|| Zn — vað, -0p — Zn — a l, = 0r 0. (8.39) 
(g) Approximation of y/n (8, — 4) by AS, The vector rA, which is the projection of Z„ on A with 
respect to the scalar product < x, y >j, := x’ Ja y, is characterized (see Lemma 1.1 in Zarantonello, 
1971) by 


EA, (Zna, A) =O, VAEA 


Jn 


(see Figure 8.1). Since ĝ, € © and A = lim + ./n(© — 4), we have almost surely /n(6, — 0o) € 
A for n large enough. The characterization then entails 


2 


| Vn Gn — 00) — Zn lj, = | Vn Â, 6) aS Th 
+2(/n6, = 0) = As 25 am Zn)» 


= |va ð, 0o) i A ł la Zn 


Jan = Zali, 


2 
Jn . 


Using (8.39), this yields 


| Vn6, — o) — aa P ss | VnGn 6) Zn P la Zn P =0p(l), 


which entails (8.9), and completes the proof. 


Proof of Proposition 8.3 


The first result is an immediate consequence of Slutsky’s lemma and of the fact that under Ho, 
Vib = Kalb, —%) 5 K^, K'{(@,-DKIOKY' KSQ. 


To show (8.23) in the standard case where 6 Ee, the asymptotic Xa distribution is established 
by showing that R, — W,, = op(1). This equation does not hold true in our testing problem 
Ho : 0 = o, where 6 is on the boundary of ©. Moreover, the asymptotic distribution of W, 
is not Xa A more direct proof is thus necessary. 

Since a is a consistent estimator of a? >0, we have, for n large enough, a > 0 and 
An (n12) 


30; =0 fori=1,...,d). 


Let 
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where K = (Ia , Qaxa). We again obtain 


al, (6 n2) = K' al, ( 612) 


a “E 00 K = (0a xdi; Ia). (8.40) 
Since 
21, (0 
3T, (80) — J almost surely, (8.41) 
3090" 
a Taylor expansion shows that 
aln (On 2) op(1) dln (Ao) 
C = yn 30 + J/n (On2 — 80) , 


where a = b means a = b +c. The last dy components of this vectorial relation yield 


al; (0) 


aln Onj2) op(l) 
vno = v" 30 


902) 


+ KJ V/n (a2 — 9%) , (8.42) 


and the first dı components yield 


o 3l, (0, ee z 
peo, aa + RIK Jn (69-0), 
using 
(êa — 60) = K' (8.3 — 04”). (8.43) 
We thus have 
op(l) ~ Aw, \—] dl, (80) 
J (en-e) =) (KER) Va. (8.44) 


Using successively (8.40), (8.42) and (8.43), we obtain 


n al, (ôn2) KÎ-'K' al, ( 6,12) 


R, = Gy ie 
k, —-1 30® 302 
op) A Əl, (60) (69 _ af) TR KIOR! 
k, —1 | 902 n| 


K 
In (0) D g0 
fA saa FEIE (603 - 65? JF. 


Let 


where W; and W3 are vectors of respective sizes dı and dz, and Jı is of size dı x dı. Thus 
7 7 ee -1 
KIB! = Jn, RIK! = Jn, RIK = Jn, KJK = (Jn — Jain) > 


where the last equality comes from Exercise 6.7. Using (8.44), the asymptotic distribution of R, 
is thus that of 


f —l1 
(w: — JJa w) (2 — Ja Jg J) (w: = JaJa W1) 
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which follows a Xa because it is easy to check that 
W — Jai J Wi ~ N (o, Jn — JuJu J) . 


We have thus shown (8.23). 
Now we show (8.24). Using (8.43) and (8.44), several Taylor expansions yield 


ee ah An (Oo) 4 3 n 
n, (no) © nly Oo) +n mae (8n12 — 9) + = (8x12 — 90)’ J (n2 — 0) 
op(1) n dl, (8) ~ gnl al, (8) 
=" nl, (A) — 5 aay (KJK’) 500) 
and An (0) 
nn (Bn) 2 nln (80) + n (6, — 80) + = (ôn — 0) J (ên — 0o) . 
By subtraction, 
op) 1 dl, (90) ps 7 an-1 IlO) . Al, (0) /» 
eee {; O (R IR’) | MeO), A (6, — a) 


+2 (6, — IE ~«)}. 


2 
0 \ ¢ / -JZ 
vi( oS )4( aA ). 


Thus, the asymptotic distribution of L,, is that of 


It can be checked that 


1 > z 1 4s 
L= 52 IRI RJZ + ZF = SAN as. 
Moreover, it can easily be verified that 
JRF 'RI = J —(k—1)Q2 where (k, — DQ = 0 9 
i 7 ý i TO Ja- Judy a J’ 
It follows that 


K, — 1 


1 Were 
L=—5ZJZ+ 7 Zaz + Zs 4 — ~ Ja^ 


1 Ky, — 1 


= =50° =y IO =Z Z' QZ, 


which gives the first equality of (8.24). The second equality follows using Exercise 8.2. 
Proof of Proposition 8.4 
Since Cov(v;, of) = 0, we have 


| Natl), _ Varo?) _ Varta ae) 
Var(€?) E Var(e?) E Var(e?) 


2. 2 
Cov(ej_;, €j) 


q 
= 2 aie 
o 2 4 aD Pe Var(€?) 


and the result follows from p,2(i) > O (see Proposition 2.2). 
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Proof of Theorem 8.2 


We first study the asymptotic impact of the unknown initial values on the statistic Êm. Introduce 
the vector rm = (r(1),...,r(m))’, where 


n 
r(h) =n! D StSt—h, with Ss =n — 1 andO <h <n. 
t=h+1 


Let s;(@) (S;(@)) be the random variable obtained by replacing n, by 7;(0) = €;/o;(0) (4:(8) = 
€,/6:(@)) in sı. The vectors rm(0) and Fm(0) are defined similarly, so that Fm = rm (60) and 
Pin = Fm(6,). Write a = b when a = b + c. Using (7.30) and the arguments used to show (d) on 
page 160, it can be shown that, as n — oo, 


Orm(O) = OF n (0) 


m — Fin (O = 1), 
VE rm = Fn = 0p), sup | m 


GEO 


| =0op(1). (8.45) 


We now show that the asymptotic distribution of ./n?,, is a function of the joint asymptotic 
distribution of ./nr, and of the QMLE. By the arguments used to show (c) on page 160, it can 
be shown that there exists a neighborhood V(@9) of 4 such that 


3r m(0) 


lim E sup 30,30 
ies] 


n> GEVO) 


| <o for all i, j e {1,...,p+q +1). (8.46) 


Using (8.45) and the fact that y/n (6, — 0) = Op (1), a Taylor expansion of r,,(-) around ô, and 

Oo shows that 

OF m(0*) 
00’ 


Or m(0*) ss 
0! Jn On = 0) 


Vn m = VnF m (90) F 
RO Sarm + 


vnô, = 0) 


for some 6* between 6, and 69. Using (8.46), the ergodic theorem, the strong consistency of the 
QMLE, and a second Taylor expansion, we obtain 


Ci 
* 
Orm(O*) op(l) OF m (80) St . , 
a0’ a0’ : 
Gn 
where 
| ası (80) | | 1 d07(o) | 
Ch = E į Sth — St-h 37 TT : 
00 20) 30 


For the next to last equality, we use the fact that E {s,0s,_,(@9)/00} = 0. It follows that 


VnP mn on) Jnr m F Cm nbn = 00). (8.47) 
We now derive the asymptotic distribution of Jn (Tins Ê, — o). In the proof of Theorem 7.2, 
it is shown that 


n 


o ee d E Eteo 5 N40, — DIT] (8.48) 
t=1 
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as n —> oo, where 


a 1 3o? (00) 
—l (2 E S — t 3 
30 i( o) St ee a0 
Note that rn op) n7! Yi S:8t—1:t—-m Where S;_14—m = (St-1) +++) Si-m). In view of (8.48), the 
aas a { (sbré:(@o), StS, iam) oO Oho U < n} 
shows that 
n 42 
Ê= \ opa 1 J- daf (4) 
vn ( E ) == — St ( of 00 
= vn 2 St—1:t—m 
£ (kn — 1)J7! Ley 
= N 0, n m, 3 (8.49) 
| ( brn (Ky = 1) In 
where 


1 d07() , 


=>: Si itm = 
o2 30 t—1:t—m 


= (Ky = 1)J7!E — (Kn = HIE 


Ont 


Using (8.47) and (8.49) together, we obtain 


Vim E NO, D), D = (ky —VW2In — eq — Cn ITC). 


We now show that D is invertible. Because the law of n? is nondegenerate, we have x, > 1. 
We thus have to show the invertibility of 


1 ðo (0 
(Ky — iin =Car E, = EVV’, V = stmt CnJ = ol 0) 
0% 00 
If the previous matrix is singular then there exists à = (A;,...,Am)/ such that à Æ 0 and 
1 3026 
Wi hi OO E ae, (8.50) 


oe ðo 


with u = A/C, J~!. Note that u = (141, ++, Up+q+1) #0. Otherwise A’s_j.-, = 0 a.s., which 
implies that there exists j € {1,...,m} such that s_; is measurable with respect to o{s;, t # — j}. 
This is impossible because the s; are independent and nondegenerate by assumption A3 on page 144 
(see Exercise 11.3). Denoting by R, any random variable measurable with respect to o{n,, u < t}, 
we have 


w 3o (60) 


30 = mon + R2 


and 
OGA'S-1:-m = (aoin? + R_») (Ain? + R-2) = Arayo2 nt, + R-n? + Ro. 


Thus (8.50) entails that 
Ayoyo2 int, + Ron?,+R2=0 as. 


Solving this quadratic equation in ii shows that either néi = R_», which is impossible by 
arguments already given, or Aja, = 0. Let à5. = (à2, ..., Am)’. If Ay = 0 then (8.50) implies that 


2:m 


! 2 2 
A1 Aans —2:—mN_1 = M2ů_ı +R a.s. 
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Taking the expectation with respect to o{n;, t < —2}, it can be seen that R-2 = a@A4,,,8—2:—-m — [2 
in the previous equality. Thus we have 


(01.4 5.8—2:—m = u2) (ni = 1) =0 as. 


which entails a, = u2 = 0, because P laz, aar Am Sm = 0} <1 (see Exercise 8.12). For 
GARCH(p, 1) models, it is impossible to have a; = 0 by assumption A4. The invertibility of D is 
thus shown in this case. In the general case, we show by induction that (8.50) entails aj = ...a@p. 

It is easy to show that D —> D in probability (and even almost surely) as n — oo. The 
conclusion follows. 


8.7 Bibliographical Notes 


It is well known that when the parameter is at the boundary of the parameter space, the maximum 
likelihood estimator does not necessarily satisfy the first-order conditions and, in general, does not 
admit a limiting normal distribution. The technique, employed in particular by Chernoff (1954) 
and Andrews (1997) in a general framework, involves approximating the quasi-likelihood by a 
quadratic function, and defining the asymptotic distribution of the QML as that of the projection of 
a Gaussian vector on a convex cone. Particular GARCH models are considered by Andrews (1997, 
1999) and Jordan (2003). The general GARCH(p, q) case is considered by Francq and Zakoian 
(2007). A proof of Theorem 8.1, when the moment assumption B7 is replaced by assumption B7’ of 
Remark 8.2, can be found in the latter reference. When the nullity of GARCH coefficients is tested, 
the parameter is at the boundary of the parameter space under the null, and the alternative is one- 
sided. Numerous works deal with testing problems where, under the null hypothesis, the parameter 
is at the boundary of the parameter space. Such problems have been considered by Chernoff (1954), 
Bartholomew (1959), Perlman (1969) and Gouriéroux, Holly and Monfort (1982), among many 
others. General one-sided tests have been studied by, for instance, Rogers (1986), Wolak (1989), 
Silvapulle and Silvapulle (1995) and King and Wu (1997). Papers dealing more specifically with 
ARCH and GARCH models are Lee and King (1993), Hong (1997), Demos and Sentana (1998), 
Andrews (2001), Hong and Lee (2001), Dufour et al. (2004) and Francq and Zakotan (2009b). 

The portmanteau tests based on the squared residual autocovariances were proposed by McLeod 
and Li (1983), Li and Mak (1994) and Ling and Li (1997). The results presented here closely follow 
Berkes, Horvath and Kokoszka (2003a). Problems of interest that are not studied in this book are 
the tests on the distribution of the iid process (see Horvath, Kokoszka and Teyssiére, 2004; Horvath 
and Zitikis, 2006). 

Concerning the overrepresentation of the GARCH(1, 1) model in financial studies, we mention 
Starica (2006). This paper highlights, on a very long S&P 500 series, the poor performance of 
the GARCH(1, 1) in terms of prediction and modeling, and suggests a nonstationary dynamics of 
the returns. 


8.8 Exercises 


8.1 (Minimization of a distance under a linear constraint) 
Let J be ann x n invertible matrix, let x9 be a vector of R”, and let K be a full-rank p x n 
matrix, p < n. Solve the problem of the minimization of Q(x) = (x — xo) J (x — xo) under 
the constraint Kx = 0. 


8.2 (Minimization of a distance when some components are equal to zero) 
Let J be ann x n invertible matrix, xp a vector of R” and p <n. Minimize Q(x) = (x — 
xo) J (x — xo) under the constraints x; =+- = Xi, = O (x; denoting the ith component of 
x, and assuming that 1 < i} <--- < ip < n). 
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8.3 


8.4 


8.5 


8.6 


8.7 


8.8 
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(Lagrangian or method of substitution for optimization with constraints) 
Compare the solutions (C.13) and (C.14) of the optimization problem of Exercise 8.2, with 


-1 
J= | -l 


Ne © 


and the constraints 
(a) x3 = 0, 
(b) X2? = X3 = 0. 


(Minimization of a distance under inequality constraints) 
Find the minimum of the function 


Ài 2 —l1 0 
Lal 2 |e owa- ZyJA-Z, J=| -1 ian 
A3 0 1 2 


under the constraints A. > 0 and 43 > 0, when 
@) Z = (-2, 1,2), 
(ii) Z = (—2, -1, 2)’, 
(iii) Z = (—2, 1, —2)’, 
(iv) Z = (—2, -1, —2)’. 
(Influence of the positivity constraints on the moments of the QMLE) 


Compute the mean and variance of the vector 4“ defined by (8.16). Compare these moments 
with the corresponding moments of Z = (Z1, Z2, Z3)’. 


(Asymptotic distribution of the QMLE of an ARCH in the conditionally homoscedastic case) 
For an ARCH(q) model, compute the matrix £ involved in the asymptotic distribution of 
the QMLE in the case where all the ag; are equal to zero. 


(Asymptotic distribution of the QMLE when an ARCH(1) is fitted to a strong white noise) 
Let 6 = (ô, &) be the QMLE in the ARCH(1) model e; = ,/w + e? n; when the true 
parameter is equal to (wo, &o) = (wo, 0) and when «y := E nt. Give an expression for the 
asymptotic distribution of „y/n (Ô — 4) with the aid of 


2 
Z = (2,22) ~N |o, ( a eee )} 


wo 


Compute the mean vector and the variance matrix of this asymptotic distribution. Determine 
the density of the asymptotic distribution of y/n (ô — wo). Give an expression for the kurtosis 
coefficient of this distribution as function of K}. 


(One-sided and two-sided tests have the same Bahadur slopes) 

Let X1, ..., Xn be a sample from the V(6, 1) distribution. Consider the null hypothesis Ho : 
0 =0. Denote by ® the M(0, 1) cumulative distribution function. By the Neyman—Pearson 
lemma, we know that, for alternatives of the form H; : 6 > 0, the one-sided test of rejection 
region 


C= nY X: > 7! (1 — qa) 


i=l 


8.9 


8.10 
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is uniformly more powerful than the two-sided test of rejection region 


n 
C= | a X 


i=l 
(moreover, C is uniformly more powerful than any other test of level œ or less). Although 
we just have seen that the test C is superior to the test C* in finite samples, we will conduct 
an asymptotic comparison of the two tests, using the Bahadur and Pitman approaches. 


>®7!(1 -a| 


e The asymptotic Bahadur slope c(@) is defined as the almost sure limit of —2/n times the 
logarithm of the p-value under P4, when the limit exists. Compare the Bahadur slopes of 
the two tests. 


e In the Pitman approach, we define a local power around 6 = 0 as being the power at t/,/n. 
Compare the local powers of C and C*. Compare also the local asymptotic powers of the 
two tests for non-Gaussian samples. 


(The local asymptotic approach cannot distinguish the Wald, score and likelihood ratio tests) 
Let X,,..., X, be a sample of the M(@,07) distribution, where @ and o? are unknown. 
Consider the null hypothesis Hp : 6 = O against the alternative Hı : © >0. Consider the 
following three tests: 


e Ci ={W, > x? (1 —@)}, where 


=—2 n n 
nx, 2 1 o) = 1 
W, = S (s _ A > (Xi = Xn) and X, = a yx] 


is the Wald statistic; 


e C ={R, = x7 (1 —@)}, where 


R= nX, 
ae Bii xX? 
is the Rao score statistic; 
e C3 = [Ln > x7(1 —@)}, where 
—1 n 2 
n eo XS 
L, =n log Tran 


is the likelihood ratio statistic. 


Give a justification for these three tests. Compare their local asymptotic powers and their 
Bahadur slopes. 


(The Wald and likelihood ratio statistics have the same asymptotic distribution) 

Consider the case d = 1, that is, the framework of Section 8.3.3 where only one coefficient 
is equal to zero. Without using Remark 8.3, show that the asymptotic laws W and L defined 
by (8.22) and (8.24) are such that 
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8.11 (For testing conditional homoscedasticity, the Wald and likelihood ratio statistics have the 
same asymptotic distribution) 
Repeat Exercise 8.10 for the conditional homoscedasticity test (8.26) in the ARCH(gq) case. 


8.12 (The product of two independent random variables is null if and only if one of the two variables 
is null) 
Let X and Y be two independent random variables such that XY = 0 almost surely. Show 
that either X = 0 almost surely or Y = 0 almost surely. 


Optimal Inference and 
Alternatives to the QMLE™ 


The most commonly used estimation method for GARCH models is the QML method studied in 
Chapter 7. One of the attractive features of this method is that the asymptotic properties of the 
QMLE are valid under mild assumptions. In particular, no moment assumption is required on the 
observed process in the pure GARCH case. However, the QML method has several drawbacks, 
motivating the introduction of alternative approaches. These drawbacks are the following: (i) the 
estimator is not explicit and requires a numerical optimization algorithm; (ii) the asymptotic nor- 
mality of the estimator requires the existence of a moment of order 4 for the noise 7;; (iii) the 
QMLE is inefficient in general; (iv) the asymptotic normality requires the existence of moments 
for €, in the general ARMA-GARCH case; (v) a complete parametric specification is required. 

In the ARCH case, the QLS estimator defined in Section 6.2 addresses point (1) satisfactorily, 
at the cost of additional moment conditions. The maximum likelihood (ML) estimator studied in 
Section 9.1 of this chapter provides an answer to points (ii) and (iii), but it requires knowledge of 
the density f of ņ,. Indeed, it will be seen that adaptive estimators for the set of all the parameters 
do not exist in general semi-parametric GARCH models. Concerning point (iii), it will be seen that 
the QML can sometimes be optimal outside of trivial case where f is Gaussian. In Section 9.2, the 
ML estimator will be studied in the (quite realistic) situation where f is misspecified. It will also 
be seen that the so-called local asymptotic normality (LAN) property allows us to show the local 
asymptotic optimality of test procedures based on the ML. In Section 9.3, less standard estimators 
are presented, in order to address to some of the points (i)—(v). 

In this chapter, we focus on the main principles of the estimation methods and do not give all 
the mathematical details. Precise regularity conditions justifying the arguments used can be found 
in the references that are given throughout the chapter or in Section 9.4. 


9.1 Maximum Likelihood Estimator 


In this section, the density f of the strong white noise (n+) is assumed known. This assumption 
is obviously very strong and the effect of the misspecification of f will be examined in the 
next section. Conditionally on the o-field F;—ı generated by {€u : u < t}, the variable e, has 
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the density x > o,~! f(x/o;). It follows that, given the observations €1,..., €, and the initial 
values €0,..., €1—q, ög, eer ae p? the conditional likelihood is defined by 


2 1 Et 
Ln, ¢ (0) = Ln, flO €1,---5€n) = Fa =~ |? 
9) £3 € ) [lz l ) 


on 
where the 6? are recursively defined, for t > 1, by 
õ =67(0) =0+) aici + D> Bð, (9.1) 


A maximun likelihood estimator (MLE) is obtained by maximizing the likelihood on a compact 
subset ©* of the parameter space. Such an estimator is denoted by 6. f 


9.1.1 Asymptotic Behavior 


Under the above-mentioned regularity assumptions, the initial conditions are asymptotically negli- 
gible and, using the ergodic theorem, we have almost surely 


Ln, ) e £ (si) (6) f(m) 
8 Ty EGO 7 aa Pas); (is) = 


using Jensen’s inequality and the fact that 


ies 
Ep 0; (00) (ait) Fi =f 1 r( 2 )a=1. 


o: (0) f (=45) 0; (8) 0; (8) 


Adapting the proof of the consistency of the QMLE, it can be shown that Ên, f — o almost surely 
as n > œo. 

Assuming in particular that 69 belongs to the interior of the parameter space, a Taylor expan- 
sion yields 


Ga logL, ean log L (Oon (Ôn. — 00) + 0p (1) (9.2) 
= —— lo — —— lo ñ n f — $ 5 
Jn 00 B Hn, f 20 n 0090' E Ln, f 20 "d f o) aE 
We have 
ə "1 f'y ao? h 
— log Lyr = — ` Sn S ——! (On) eS ` 9.3 
30 og „fÈ 0) 207 | F F Nt 90 o) Vy (9.3) 


t=1 


It is easy to see that (v, F+) is a martingale difference (using, for instance, the computations of 
Exercise 9.1). It follows that 


ə 
Sn. f Oo) := ae log Ln, f (80) —> NO, 3), (9.4) 
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where J is the Fisher information matrix, defined by 


i 1a 29 2 1 2 
J= El = pe < SE (Oo). te =| {1+ co. f(x)dx. 


Note that ¢¢ is equal to o? times the Fisher information for the scale parameter ø >0 of the 
densities o™! f (-/o). When f is the (0,1) density, we thus have ¢7 = o? x 2/0? = 2. 
We now turn to the other terms of the Taylor expansion (9.2). Let 


F 1 3? 
J) = —--—— log Ln, s (0). 


n 0006! 
We have 
5o) =I + op (1), (9.5) 
thus, using the invertibility of J, 
n2 (ô, ¢ — ) <> N{0,3-}. (9.6) 
Note that 
Ôn, f = bo + I! Sn, ¢Oo)/Vn + 0p n7"). (9.7) 


With the previous notation, the QMLE has the asymptotic variance 


-1 


"AEV 4 1 307 do? 
IF := (En; Dfi a0 gr (00) 
Ueroa, (9.8) 


4 


The following proposition shows that the QMLE is not only optimal in the Gaussian case. 


Proposition 9.1 (Densities ensuring the optimality of the QMLE) Under the previous 
assumptions, the QMLE has the same asymptotic variance as the MLE if and only if the density of 
N is of the form 


a 


f= 


CO 
exp(—ay’)|y/"', a > 0, T(a)= f t! exp(—t)dt. (9.9) 
r (a) 0 
Proof. Given the asymptotic variances of the ML and QML estimators, it suffices to show that 
(En; — Nes > 4, (9.10) 


with equality if and only if f satisfies (9.9). In view of Exercise 9.2, we have 


uC r,) dy = —2. 
fo (1+ Fy) ron 


The Cauchy—Schwarz inequality then entails that 


1 2 
4< fo E 1 foray f (: iP rey) f(y)dy = (En? — 1b; 
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with equality if and only if there exists a # 0 such that 1+ n: f’(m:)/f (m) = —2a (n? — 1) as. 
This occurs if and only if f’(y)/f(y) = —2ay + (2a — 1)/y almost everywhere. Under the con- 
straints f > 0 and J f(y)dy = 1, the solution of this differential equation is (9.9). 


Note that when f is of the form (9.9) then we have 


log Ln, f (0) = “> (z gy en 20) 


up to a constant which does not depend on 8. It follows that in this case the MLE coincides with 
the QMLE, which entails the sufficient part of Proposition 9.1. 


9.1.2 One-Step Efficient Estimator 


Figure 9.1 shows the graph of the family of densities for which the QMLE and MLE coincide 
(and thus for which the QML is efficient). When the density f does not belong to this family of 
distributions, we have ¢ p (S x4 f(x)dx — 1) > 4, and the QMLE is asymptotically inefficient in the 
sense that 


Varas Vn {Ôn — 6} — Varas vn {6n, f 60} = (Eni 1 =) Jo} 


is positive definite. Table 9.1 shows that the efficiency loss can be important. 
An efficient estimator can be obtained from a simple transformation of the QMLE, using the 
following result (which is intuitively true by (9.7)). 


0.6 4 
M I a 
\ \ = 
05 H 1 | H ae ae "2 
fo qh fob fev aee 
0.4 4 E \ d e 3 | i i 
l i 4M L \\ 
03- fee AA 
i; £ a ; IN a ' 
E / z Fa 1 : ‘ 2 
F yf \ \ l : AA 
oaa Hl \ \ l | ‘N N. 
ï T S 
0.0 4 amt oy VIS 
. se. | 7 = 7 


Figure 9.1 Density (9.9) for different values of a >0. When n, has this density, the QMLE and 
MLE have the same asymptotic variance. 
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Table 9.1 Asymptotic relative efficiency (ARE) of the MLE with respect to the QMLE, 
VatasOn /WatasOn,, when f(y) = /v/v —2f,(y/v/v — 2), where fy denotes the Student t 


density with v degrees of freedom. 
v 5 6 7 8 9 10 20 30 oe) 
ARE 5/2 5/3 7/5 14/11 6/5 15/13 95/92 145/143 1 


Proposition 9.2 (One-step efficient estimator) Let 6, be a preliminary estimator of 0o such that 
Jn (6, — 00) = Op(1). Under the previous assumptions, the estimator defined by 


Bn. ¢ = On +I Gn) Sn, ¢ On)/ 0 
is asymptotically equivalent to the MLE: 

s/n Gn, — 0) —> N {0,7}. 
Proof. A Taylor expansion of S$, (-) around 69 yields 

Sn, f (On) = Sn, f Oo) — I'n (6, — 00) + op(1). 
We thus have 
Vit (On, f — 0) = vn (Bnp — Oa) + Vn (Bn — 00) 
= F! Sn, f En) +I! {Sn,¢O0) — Sn, f&a) } + Op (1) 


= I! S, s (80) + op(1) > N{0,3-'}, 


using (9.4). 


In practice, one can take the QMLE as a preliminary estimator: 6 = 6). 


Example 9.1 (QMLE and one-step MLE) JN = 1000 independent samples of length n = 100 
and 1000 were simulated for an ARCH(1) model with parameter w = 0.2 and a = 0.9, where the 
distribution of the noise n; is the standardized Student ż given by f(y) = /v/v = 2f,(y./v/v = 2) 
(fy denoting the Student density with v degrees of freedom). Table 9.2 summarizes the estimation 
results of the QMLE 4, and of the efficient estimator n, f. This table shows that the one-step 
estimator n, f is, for this example, always more accurate than the QMLE. The observed relative 
efficiency is close to the theoretical ARE computed in Table 9.1. 


9.1.3 Semiparametric Models and Adaptive Estimators 


In general, the density f of the noise is unknown, but f and f’ can be estimated from the normal- 
ized residuals f; = €;/o; (ô), t= 1,...,n (for instance, using a kernel nonparametric estimator). 
The estimator 6, f (or the one-step estimator 6,7) can then be utilized. This estimator is said to 
be adaptive if it inherits the efficiency property of n.f for any value of f. In general, it is not 
possible to estimate all the GARCH parameters adaptively. 

Take the ARCH(1) example 


2 


& = rt, ™ ~ fa (9.11) 
oj =w + der), 
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Table 9.2 QMLE and efficient estimator nfo on N = 1000 realizations of the ARCH(1) model 
Et = Orn, a =w + er 9: w = 0.2, æ = 0.9, n ~ fly) = viv — 2 fi (v/v/v — 2). The last 
column gives the estimated ARE obtained from the ratio of the MSE of the two estimators on the 
N realizations. 


QMLE ô, n.f 

v n bo Mean RMSE Mean RMSE ARE 
5 100 a = 02 0.202 0.0794 0.211 0.0646 1.51 
a—0.9 0.861 0.5045 0.857 0.3645 1.92 

1000 w= 02 0.201 0.0263 0.201 0.0190 1.91 

a—0.9 0.897 0.1894 0.886 0.1160 2.67 

6 100 o=0.2 0.212 0.0816 0.215 0.0670 1.48 
a=0.9 0.837 0.3852 0.845 0.3389 1.29 

1000 w=0.2 0.202 0.0235 0.202 0.0186 1.61 

a=0.9 0.889 0.1384 0.888 0.1060 1.70 

20 100 S02 0.207 0.0620 0.209 0.0619 1.00 
a=09 0.847 0.2899 0.845 0.2798 1.07 

1000 w=0.2 0.199 0.0170 0.199 0.0165 1.06 

a=0.9 0.899 0.0905 0.898 0.0885 1.05 


For v = 5, 6 and 20 the theoretical AREs are respectively 2.5, 1.67 and 1.03 (for œ and w). 


where n; has the double Weibull density 
À hat à 
A@) = zll exp(—|x|"), A>0. 


The subscript 0 is added to signify the true values of the parameters. The parameter 9o = (6), Ao)’, 
where 69 = (wo, Qo)’, is estimated by maximizing the likelihood of the observations €),..., €n 
conditionally on the initial value €9. In view of (9.3), the first two components of the score are 
given by 


0 —~ l f (nt) do; 
— lo Ln, ; (Vo) = Oe fı + n (Vo), 
age La l!t eeu w 


with 


fin) | : T ( 1 ) 
1 tt = ào (1 — In|“), — = . 
| Eao tf = Pot = Il) 30 e, 


The last component of the score is 


n 


a 1 ; 
DA log Ln, f, 80) = 5D P + (1 = In|”) tog ni} . 


t=1 
Note that 
Tg = E {40 (1 —Iml)}° = 25, 


1 2 l-2y+y?+77/3 
Et}— 4+(1—In,/*°) log a ee 
{>, ( Inl ) tog nif e 


E | {20 (1 —|nP°)} {>,+ (1- inl) og inf =1-y, 
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where y = 0.577... is the Euler constant. It follows that the score satisfies 


Z ð £ J Jiz 
12 . _ 11 12 
n a log Ln, f (80) —> nfo, J= ( Y, J» )} ; 


where 


a 1 & -1 1 1 
my =e ( 2 A in = r 2 i: 
A(wo + oE \ Ga E 2 aptager, \ &1 


and Jn = A” (1 —2y+y?+n? /6). By the general properties of an information matrix (see 
Exercise 9.4 for a direct verification), we also have 


2 


avav’ 


n! 


log Ln, f (90) > —I a.s. as n — oo. 


The information matrix J being such that Jı2 Æ 0, the necessary Stein’s condition (see 
Bickel, 1982) for the existence of an adaptive estimator is not satisfied. The intuition behind this 
condition is the following. In view of the previous discussion, the asymptotic variance of the 
MLE of o should be of the form 
11 12 
a =( a za I; 


When Ao is unknown, the optimal asymptotic variance of a regular estimator of 6 is thus 3!!. 
Knowing Ao, the asymptotic variance of the MLE of 6ọ is ire If there exists an adaptive estimator 
for the class of the densities of the form f, (or for a larger class of densities), then we have 


3! = 571. Since J!! = (31 mF J1) (see Exercise 6.7), this is possible only if J12 = 0, 
which is not the case here. 

Reparameterizing the model, Drost and Klaassen (1997) showed that it is, however, possible 
to obtain adaptative estimates of certain parameters. To illustrate this point, return to the ARCH(1) 
example with the parameterization 


E = CO, m~ fr 
| o =1+ae}]. ve) 


Let V = (œ, c, A) be an element of the parameter space. The score now satisfies 


n 


2 oe A = ho 2 
zg 108 Ln, f (Po) = 2 o7 {ào (1 = Inel’) Fea, 


ð 
— log Ln, , (80) = — X — {a0 (1 — Im1*)}, 


dc Co 
t=1 0 


a “fl ‘ 
y 08 ba. fi Po) = 5 P + (1 — [n:l o) login} . 


t=1 


Thus n-"/4 log Ly, f (8o)/80 > MN (0, 3) with 


40 20 aly 
4 A 2c, 2 B 
A 4g a l-y 
J= Zeo zZ -o , 
hyp lv 1—2y+y?+7?/6 
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where <4 2 
Asp E Ba pnl 
7? è 
(14+ ave? 4) L+ ae 


It can be seen that this matrix is invertible because its determinant is equal to mw Aa(A — 
B?) /24c2 > 0. Moreover, 


4 _ __2coB 0 
à0(4—B?) à0(4—B?) 
2 2 =y LERZ —a7)2 
gta] 2a deers Hea] sway 
à6(A—B?) 175 (A—B?) n? 
0 6ey(I=y) sag 
x? x 


The MLE enjoying optimality properties in general, when ào is unknown, the optimal variance of 
an estimator of (œo, co) should be equal to 


4 2coB 
TED E= 
Xm = oni [4{x2+60-7)?} -6821-7)?| 
49(A—B?) 722(A—B2) 


When Ag is known, a similar calculation shows that the MLE of (ao, co) should have the asymp- 
totic variance 


2 2 =] x 
104 20 a == 
5 |3 Teo _ 43(A—B2) 45(A-B?) 
ML|ky = a a = 2coB chA 
Zco a (A-B?) A (A—B?) 


We note that E mLo(l, 1) = Lux, 1). Thus, in presence of the unknown parameter c, the MLE 
of a is equally accurate when Ag is known or unknown. This is not particular to the chosen 
form of the density of the noise, which leads us to think that there might exist an estimator of 
ao that adapts to the density f of the noise (in presence of the nuisance parameter c). Drost and 
Klaassen (1997) showed the actual existence of adaptive estimators for some parameters of an 
extension of (9.12). 


9.1.4 Local Asymptotic Normality 


In this section, we will see that the GARCH model satisfies the so-called LAN property, which 
has interesting consequences for the local asymptotic properties of estimators and tests. Let 6, = 


6 +h, /./n be a sequence of local parameters around the parameter 0 cO, where (hn) is a bounded 
sequence of R?+¢+!. Consider the local log-likelihood ratio function 


Ln, f On) 


hyn —> An, ¢ (On, 0) := log ———. 
Ly,¢) 


The Taylor expansion of this function around 0 leads to 
1 
An, FO + hn / Vn, 0) = hi, Sn, ¢ (0) — zn IO)hn + op, (1), (9.13) 
where, as we have already seen, 


Sy,p(0) —> N10, I(0)} under Pp. (9.14) 
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It follows that 


Py (1) 


o 1 
An tO ga hn/Jn, 0) ~ N(-3% r) , Tn = h I(@)hn. 


Remark 9.1 (Limiting Gaussian experiments) Denoting by L(h) the M { h, I=! (6)} density 
evaluated at the point X, we have 


Ath, 0) = tog ® = 1x -WIO — Lig 
(h, 0) := STO — 5 ) ( +3 


1 1 
= — hh +h'IX ~ N (-ZWIOh, WIO) 


under the null hypothesis that X ~ M {0, I!(0)}. It follows that the local log-likelihood ratio 
An, f0 +h /Jn, 0) of n observations converges in law to the likelihood ratio A(h,0) of one 
Gaussian observation X ~ VV { h, 37! (@)}. Using Le Cam’s terminology, we say that the ‘sequence 
of local experiments’ {Ly,¢(0+h/J/n),h € IR?*4*1) converges to the Gaussian experiment 
{N(h, IT! (0)), h € RPI}, 


The property (9.13)—(9.14) is called LAN. It entails that the MLE is locally asymptotically 
optimal (in the minimax sense and in various other senses; see van der Vaart, 1998). The LAN 
property also makes it very easy to compute the local asymptotic distributions of statistics, or the 
asymptotic local powers of tests. As an example, consider tests of the null hypothesis 


Ao : Xq = Q0 > 0 
against the sequence of local alternatives 
Ay, : Ag = ly + c/ Jn. 


The performance of the Wald, score and of likelihood ratio tests will be compared. 


Wald Test Based on the MLE 


Let @,¢ be the (q + 1)th component of the MLE 6n, f- In view of (9.7) and (9.13)—(9.14), we 
have under Hp that 


1 
Jn(&q, f — ao) _ gur eG 
( An.f Oo + h/ Jn, 6) ) = \ pay- Ke ) toO, ae 


where X ~ M(0, l+ p+q), and e; denotes the ith vector of the canonical basis of Rett] noting 


that the (q + 1)th component of 60 cO is equal to ap. Consequently, the asymptotic distribution 
of the vector defined in (9.15) is 


0 $ ~—l $ 
n{( 5K ); ( egri? Cat Egh )} under Hp. (9.16) 
-43 h'Jh 


h'eq+ı 


Le Cam’s third lemma (see van der Vaart, 1998, p. 90; see also Exercise 9.3 below) and the 
contiguity of the probabilities Py, and Po+a/yn (implied by the LAN property (9.13)-(9.14)) 
show that, for en4ih =C, 


a L = 
Jn(&q, f — a0) —> N{c, Cid teat} under H,. 


228 GARCH MODELS 
The Wald test (and also the ¢ test) is defined by the rejection region {W,, ¢ > xa —a)} where 
Wi, f = (Aq, f — oo)” {e137 On, peqat} 


and xa — a) denotes the (1 — œ)-quantile of a chi-square distribution with 1 degree of freedom. 
This test has asymptotic level œ and local asymptotic power c œ> 1 — ®, { x7 ( 1- a)}, where ®,(-) 
denotes the cumulative distribution function of a noncentral chi-square with 1 degree of freedom 


and noncentrality parameter! 
2 
2 


ri f=] * 
egi eq+1 


This test is locally asymptotically uniformly most powerful among the asymptotically 
unbiased tests. 


Score Test Based on the MLE 


The score (or Lagrange multiplier) test is based on the statistic 


1 Alog Ln (Oo An 3log Ln, pO +) 
phe S516 ,) ar (9.17) 


R, = 
lS a0 nf a0 


where ĝe f is the MLE under Ab, that is, constrained by the condition that the (q + 1)th component 
of the estimator is equal to ag. By the definition of ĝe po We have 


ð log Ln, fÊ p) 


0, i ii, 1 
26, >» i#q+ (9.18) 


In view of (9.17) and (9.18), the test statistic can be written as 


1 |- log Ln, ¢ (0% +) 
n 


2 
= 1 Â—l1;ĝc 
R, f=- e | ed OF pegtt- (9.19) 


Under Ho, almost surely 6c p= 0o and bn. f — %. Consequently, 


l ð log Ln, (On, f) op(l) cs ð log Ln, f (80) 


0 = 
Jn 30 Jn 30 


= 3/nOn, f = o) 


and "i 
1 3log Ln, f(O f) opa) 1 3log Ln, Go) 


Vn a0 Jno d 


Taking the difference, we obtain 


— IVn (Ô; p — 60). 


Ta 90 IVN Ôn, f = Grp) (9.20) 


op) A Ac A Ac 
2 JOS pnn s — OF 7) 


' By definition, if Z;,..., Z, are independently and normally distributed with variance 1 and means 


Pilg ieas mx, then aa z? follows a noncentral chi-square with k degrees of freedom and noncentrality 
2 


i 


k 
parameter $`; m 
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which, using (9.17), gives 


op(1) 


Rip = nnp — ÂE SOS Gn ¢ ÂE p). (9.21) 


From (9.20), we obtain . 
=f 1 9 log Ln, fOr, p) 


Jn 30 


Using this relation, ez ies f = 20 and (9.18), it follows that 


vVnÔn, f _ 6. 7) es J 


P op(l) AE 1 ð log Ln, p (Oe ) 
Saang — 00) E (e @o)eq41} =—— n 


Jn 06q+1 
Using (9.19), we have 


op(l) (ân f — 0)? opi) 


Rf — = W, f under Ho. (9.22) 
e p13 (On peq 


By Le Cam’s third lemma, the score test thus inherits the local asymptotic optimality properties 
of the Wald test. 
Likelihood Ratio Test Based on the MLE 


The likelihood ratio test is based on the statistic L,, f = 2An,¢ On, f. 6c E The Taylor expansion 
of the log-likelihood around 6, s leads to 


Ac op(\) A Ac A , ð log Ln, n, ) 
log Ln, fn, f) = log Ln, ¢ (On, f) TF (C iz On f) — 


1 a? log Ln, ¢On,f) 


1 x A z P 

=O p — On : Or. — On , 
ia nf f) 3030’ ( nf Sf) 
thus, using 3 log Ln, ¢ On, ¢)/90 = 0, (9.5) and (9.21), 


dQ) on A simeehe A 0) 
Ln s = nÔ; p — Ôn, I ¢ — Ên, 7) °D Raf 


under Hp and H,,. It follows that the three tests exhibit the same asymptotic behavior, both under 
the null hypothesis and under local alternatives. 


Tests Based on the QML 


We have seen that the W,, f, Rn, ¢ and L,, ¢ tests based on the MLE are all asymptotically equivalent 
under Ho and H, (in particular, they are all asymptotically locally optimal). We now compare these 
tests to those based on the QMLE, focusing on the QML Wald whose statistic is 


Wr = (Gq — 00)" e1 Fy) Oneqt. 


where @, is the (q + 1)th component of the QML Ên, and A (8) is 


A If è * (124 1 aoao | 
3/6) =- L -1| -) ———@ 
w ©) a; adaa 30 307 O 


or an asymptotically equivalent estimator. 
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Remark 9.2 Obviously, Jy should not be estimated by the analog of (9.5), 


— (bn), (9.23) 


ti» 


B Ly Blog Lath) op) Le a 007 3o? 
n 0006! E 2n — 4 30 a6! 


nor by the empirical variance of the (pseudo-)score vector 
& 1 i ð log Ln (6n) ə log Ly (6) 
Ji := — SO 
ne Ga 36" 


n 2 2 2 
op) 1 A 1 ðof ðof a 
= — 1 — -Ra = Ar. On , 9.24 
( =a) 4 06 a0" ) Ce 


which does not always converge to Jw, when the observations are not Gaussian. 
Remark 9.3 The score test based on the QML can be defined by means of the statistic 


ee: 
1 { dlogL, (6%) ee 
R, = F [ea ear Jy Oleg; 
q 


denoting by ĝe the QML constrained by ĝâ4 = a. Similarly, we define the likelihood ratio test 
statistic L, based on the QML. Taylor expansions similar to those previously used show that 


1 1 
W, SE R, oul L, under Ho and under H,,. 


Recall that 


=1 = n s1 2 
Vin — 8) "E ale l a = | = IRAE 


of “00 00’ n 207 of 00 
t=1 
2 2 
op(l) or 571 ef | 90; 
> sf — (o). 
2 à + Qa? =|! - 5} 99 


Using (9.13)—(9.14), (9.8) and Exercise 9.2, we obtain 
Covas {vn Â, — 6), An, f (80 + h/Jn, 6)} 


op) St y- _ f'm 1 do? 3o? _ 
5) efa m+n MF FOD) ea fa h=h 


under Ho. The previous arguments, in particular Le Cam’s third lemma, show that 
A L a 
/n(&q — a0) —> Nfe, eat (6o)eq+1} under H,. 


The local asymptotic power of the {W,, > xa — a)} test is thus c œ> 1 — Ọs {x7 — a)}, where 
the noncentrality parameter is 


e 4 e 
mea TAO a ea 
egt Iy ear Xf O)dx- Dopey yy eq+1 
Figure 9.2 displays the local asymptotic powers of the two tests, c œ> 1 — ®, { x? (0.95)} (solid 
line) and c œ> 1 — ®¢ { x7 (0.95)} (dashed line), when f is the normalized Student ¢ density with 


5 degrees of freedom and when 6p is such that ëz Iy €g+1 = 4. Note that the local asymptotic 
power of the optimal Wald test is sometimes twice as large as that of score test. 
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Figure 9.2 Local asymptotic power of the optimal Wald test {Wnt > x? (0.95)} (solid line) and 


of the standard Wald test {W, > x7(0.95)} (dotted line), when f(y) = /v/v —2f,(yVv/v — 2) 
and v = 5. 


9.2 Maximum Likelihood Estimator with 
Misspecified Density 


The MLE requires the (unrealistic) assumption that f is known. What happens when f is mis- 
specified, that is, when we use On h with h = f? 

In this section, the usual assumption E n2 = ] will be replaced by alternative moment assump- 
tions that will be more relevant for the estimators considered. Under some regularity assumptions, 
the ergodic theorem entails that 


6,4 = arg max Q,(0), where Q,(6) > Q(@) = Ef log 2 (9) h\ 91 (60) a.s. 
6 o: (0) or (0) 
Here, the subscript f is added to the expectation symbol in order to emphasize the fact that 
the random variable no follows the distribution f, which does not necessarily coincide with the 
‘instrumental’ density h. This allows us to show that 


On. > O* = arg max Q(0) as. 
Note that the estimator Ên, h can be seen as a non-Gaussian QMLE. 


9.2.1 Condition for the Convergence of Ôn, n to 0o 


Note that under suitable identifiability conditions, o,(8o)/o; (0) = 1 if and only if 0 = 6. For 
the consistency of the estimator (that is, for 0* = 6o), it is thus necessary for the function o —> 
Efg(no, o), where g(x, o) = logoh(xo), to have a unique maximum at |: 


Erg(no,o) < Efg(no, 1) Vo>0, o Æl. (9.25) 


Remark 9.4 (Interpretation of the condition) If the distribution of X has a density f, and if h 
denotes any density, the quantity —2E¢ log A(X) is sometimes called the Kullback—Leibler contrast 
of h with respect to f. The Jensen inequality shows that the contrast is minimal for h = f. Note 
that h,(x) = oh(xo) is the density of Y/o, where Y has density h. The condition thus signifies 
that the density h minimizes the Kullback—Leibler contrast of any density of the family ho, o > 0, 
with respect to the density f. In other words, the condition says that it is impossible to get closer 
to f by scaling h. 
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It is sometimes useful to replace condition (9.25) by one of its consequences that is easier to 
handle. Assume the existence of 
dg(x,o) 1 h'(ox) 


81,0) = do o j h(ox) 


X. 


If there exists a neighborhood V (1) of 1 such that E p suPsey() |8100, 0)| < 00, the dominated 
convergence theorem shows that (9.25) implies the moment condition 


f AG) jons (9.26) 
xf(x)dx = —1. ; 
h(x) 

Obviously, condition (9.25) is satisfied for the ML,that is, when h = f (see Exercise 9.5), and 
also for the QML, as the following example shows. 


Example 9.2 (QML) When A is the M(0, 1) density, the estimator Ôn, h corresponds to the QMLE. 
In this case, if En? = l, we have Erg(no,0) = a? /2 + logo log V27, and this function 
possesses a unique maximum at ø = 1. We recall the fact that the QMLE is consistent even when 
f is not the (0, 1) density. 


The following example shows that for condition (9.25) to be satisfied it is sometimes necessary to 
reparameterize the model and to change the identifiability constraint En? = 1. 


Example 9.3 (Laplace QML) Consider the Laplace density 


1 
h(y) = zP (-lyl), A>O. 


Then Ey g(yo,0) = —o E|no| + logo — log 2. This function possesses a unique maximum at o = 
1/E|no|. In order to have consistency for a large class of density f, it thus suffices to replace 
to usual constraint E A = | in the GARCH model €; = on; by the new identifiability constraint 
E\|n,| = 1. Of course of no longer corresponds to the conditional variance, but to the conditional 


moment o; = E (|e;| | €u, u < t). 


The previous examples show that a particular choice of hcorresponds to a natural identifiability 
constraint. This constraint applies to a moment of n; (En? = 1 when h is N(O, 1), and E|n,| = 1 
when h is Laplace). Table 9.3 gives the natural identifiability constraints associated with var- 
ious choices of h. When these natural identifiability constraints are imposed on the GARCH 
model, the estimator On h can be interpreted as a non-Gaussian QMLE, and converges to 69, even 
when h + f. 


9.2.2 Reparameterization Implying the Convergence of Ôn,n to 0 


The following examples show that the estimator Ô, „n based on the misspecified density h # f 
generally converges to a value 0* Æ 4) when the model is not written with the natural identifiabil- 
ity constraint. 


Example 9.4 (Laplace QML for a usual GARCH model) Take h to be the Laplace density 
and assume that the GARCH model is of the form 


Er = Or 
2 = q 2 P 2 
o = a+ } i= G7 + viel Pojo; j 


with the usual constraint E n2 = 1. The estimator 6, „p does not converge to the parameter 


0o = (0, Qor, - - ++ Mog, Bors +--+ Bop)’: 
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Table 9.3 Identifiability constraint under which On-h is consistent. 


Law Instrumental density h Constraint 
2 

Gaussian a exp |- — Ey _ mEne =| 
Double gamma ag 1? lexp{—b|x|}, b, p>0 E\n,| = 2 
Laplace i exp {—|x|} E|n,| = 1 
Gamma Tae l exp {—b|x|} 10,00) (x) En = £ 
Double inverse gamma To |x|?! exp {—b/|x]} EW =? 

: V2. ay 
Double inverse x? eH |x| 2-1 exp {—vo?/2|x|} EW = + 
Double Weibull AlxP-lexp(—[xP*), 2>0 E|n,* =1 
Gaussian generalized AS exp (—|x|*/A) E|n|* = 
Inverse Weibull lx]! exp (—lx|7>), à>0 E|n|7* =1 
Double log-normal IRET exp [eee | E log |n| =m 


The model can, however, always be rewritten as 


*2 q *o *2 
oO; est Lare FE jt- j? 


with n* = ņn:/0, oř = oo; and ọ = f |x| f (x)dx. Since E|n7| = 1, the estimator Bah converges to 
* = (07a, 07001, -.-, 070g, Bois ---» Bop)’: 


Example 9.5 (QMLE of GARCH under the constraint E|n;| = 1) Assume now that the 
GARCH model is of the form 


| Et = Ort 
2 _ q p p a 
of =+} j- ie; EFÈ, 1 Pojo js 


with the constraint E|n,| = 1. If h is the Laplace density, the estimator Êna.n converges to the 
parameter 0o, regardless of the density f of 7; (satisfying mild regularity conditions). The model 


can be written as 
€r = ořný 
Ei 
of =at + DL ape ele | Bro, 


with the usual constraint Ene = | when 7* = 0, /0, 07 = oo; and 9 = „j f x? f(x)dx. The stan- 
dard QMLE does not converge to 99, but to 0* = (o7wo, 07 a1, splits 0° doq, Bot, +--+ Bop)’ 


9.2.3 Choice of Instrumental Density h 


We have seen that, for any fixed h, there exists an identifiability constraint implying the convergence 
of Bah to @ (see Table 9.3). In practice, we choose not the parameterization for which nh 
converges but the estimator that guarantees a consistent estimation of the model of interest. The 
instrumental function h is chosen to estimate the model under a given constraint, corresponding 
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to a given problem. As an example, suppose that we wish to estimate the conditional moment 
E,-1 (€f) := E (ef | €u, u < t) of a GARCH(p, q) ioes, It will be convenient to consider the 
parameterization €; = opn, under the constraint E m= = 1. The volatility o; will then be directly 
related to the conditional moment of interest, by the relation of = E;—1 (e$). In this particular 
case, the Gaussian QMLE is inconsistent (because, in particular, the QMLE of œ; converges to 
a,E n2). In view of (9.26), to find relevant instrumental functions h, one can solve 


h'(x) 
h(x) 


since E(A — àn?) = 0 and E f1 + h'(x)/h(m)m} = 0. The densities that solve this differential 
equation are of the form 


1+ 


x =A—Ax?, à £0, 


h(x) = e|x|*! exp (—Ax4/4), A>0. 


For à = 1 we obtain the double Weibull, and for 4 = 4 a generalized Gaussian, which is in 
accordance with the results given in Table 9.3. 

For the more general problem of estimating conditional moments of |e;| or log |e;|, Table 9.4 
gives the parameterization (that is, the moment constraint on n+) and the type of estimator (that is 
the choice of h) for the solution to be only a function of the volatility o, (a solution which is thus 
independent of the distribution f of n,). It is easy to see that for the instrumental function h of 
Table 9.4, the estimator Ên, h depends only on r and not on c and à. Indeed, taking the case r > 0, 
up to some constant we have 

€ ) 
AOINA 


which shows that Bh does not depend on c and å. In practice, one can thus choose the simplest 
constants in the instrumental function, for instance c = A = 1. 


n 


A 
log Ln,n(8) =-=) (1089; (0) + 


t=1 


9.2.4 Asymptotic Distribution of 6,,; 


Using arguments similar to those of Section 7.4, a Taylor expansion shows that, under (9.25), 


0= = lo Ln, Onn 
a g ni n) 
—— log Ln,h o ah h— fa) 
=- h\Y0 n 0000" h\70 ae 0 P 


where 


ica 1 a 1 o, (80) 
ap OB Enn) = 2 TTA ( o0) n) 


ie ( an —0; (80) 3o (0) 
=—=} gı Nt, 


Jn = 0:(0) J 203(@) 30 
Table 9.4 Choice of h as function of the prediction problem. 
Problem Constraint Solution Instrumental density h 
Ei lel, r>0 Eln =1 of cx]?! exp (—Alal"/r), A>0 
Ei lel? Elm" =1 or” clx] >! exp (—Aàlx|™"/r) 


E,- log |e;| E log |n:| = 0 log o; Ji] m|2x|~! exp {—A(log |x1)?} 
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and 
2 


1 3 t 
~—— log Lan (@o) = — 2 (Ne, 1 
z 0000" 2E „a (80) 72 r, 1) 


1 da2(0) 3o? (60) 
404 (00) 30 30' 


+op(1). 


The ergodic theorem and the CLT for martingale increments (see Section A.2) then entail that 


> 2 i 3o? (0 
Vina =) = an AE D a EO + opt) 
4 NO, 4r J7) (9.27) 
where 
J= ge Mette gs and th p= Ereim) (9.28) 
or 90 30 [E fe2(0, 1)} 


with gı (x, o) = ðg(x,o)/ðo and go(x, o) = ðgı (x, o)/ðo. 


Example 9.6 (Asymptotic distribution of the MLE) When h = f (that is, for the MLE), we 
have g(x, o) =logof(xo) and gi(x, o) = 07! + xf'(xo)/f (xo). Thus Eşg° (no, 1) = ¢f, as 
defined on page 221. This Fisher information can also be expressed as €¢ = — E fg2(no, 1). This 
shows that Tiy =¢ f L and we obtain (9.6). 


Example 9.7 (Asymptotic distribution of the QMLE) If we choose for h the density (x) = 
(2x)! exp(—x?/2), we have g(x, o) = logo — o?x?/2 — log VIr, gix, o) = o™! —ox? and 
g(x, 0) = —0 7? — x°. Thus Esg? (no, 1) = (ky — 1), with ky = f xt f (x)dx, and Efg2(no, 1) = 
—2. Therefore T, = (K — 1)/4, and we obtain the usual expression for the asymptotic variance 
of the QMLE. 


Example 9.8 (Asymptotic distribution of the Laplace QMLE) Write the GARCH model as 
€ = Orr, with the constraint E¢|n;| = 1. Let €(x) = t exp (—|x|) be the Laplace density. For 
h = £ we have g(x, o) = logo — o|x| — log2, gi(x,0) =o! — |x| and go(x,0) = —07?. We 
thus have Te F = Efn? - 1. 


Table 9.5 completes Table 9.1. Using the previous examples, this table gives the ARE of the 
QMLE and Laplace QMLE with respect to the MLE, in the case where f follows the Student t 
distribution. The table does not allow us to obtain the ARE of the QMLE with respect to Laplace 
QMLE, because the noise 7; has a different normalization with the standard QMLE or the Laplace 
QMLE (in other words, the two estimators do not converge to the same parameter). 


Table 9.5 Asymptotic relative efficiency of the MLE with respect to the QMLE and to the 
Laplace QMLE: Tå plt p and Ti pT gs where ¢ denotes the M(0, 1) density, and £(x) = 


5 exp (—l|x|) the Laplace density. For the QMLE, the Student f density f, with v degrees of free- 


dom is normalized so that En? = |, that is, the density of n; is f(y) = v v/v — 2 fi (yv v/v — 2). 
For the Laplace QMLE, n, has the density f(y) = E|t.| fp OElt |), so that E\n;| = 1. 


Th plg v 
5 6 7 8 9 10 20 30 100 


MLE - QMLE 25 1.667 14 1.273 1.2 1.154 1.033 1.014 1.001 
MLE — Laplace 1.063 1.037 1.029 1.028 1.030 1.034 1.070 1.089 1.124 
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9.3 Alternative Estimation Methods 


The estimation methods presented in this section are less popular among practitioners than the 
QML and LS methods, but each has specific features of interest. 


9.3.1 Weighted LSE for the ARMA Parameters 
Consider the estimation of the ARMA part of the ARMA(P, Q)-GARCH(p, q) model 


P Q 
Xı — © = X ao (Xni —co) +e — X bojerj 


i=l j=l 


a= vhn (9.29) 
q p 
hi = œ + X aoier_; + >X Boj hij; 


i=l j=l 


where (n+) is an iid(0,1) sequence and the coefficients wo, œo; and fo; satisfy the usual positivity 
constraints. The orders P, Q, p and q are assumed known. The parameter vector is 


v= (c,a),...ap,b,...,b9)', 


the true value of which is denoted by 9%, and the parameter space Y C R?*+2+!, Given observations 
Xi, ..., Xn and initial values, the sequence é, is defined recursively by (7.22). The weighted LSE 
is defined as a measurable solution of 


n 
ĝ, = arg minn”! J wē? (0), 
0 t=1 


where the weights w, are known positive measurable functions of X;—1, X;-2,.... One can take, 


for instance, 
t—1 


o! =14+ 0X, al 
k=1 


with E|X,|7° < oo and s € (0, 1). It can be shown that there exist constants K > 0 and p € (0, 1) 
such that 


t-1 t-1 
3 0€, 

lé:| < K (1+ InI) (14-5 ote and - < KY Xl. 
k=1 : k=1 


This entails that 


CO CO 
oč 
m 1+1/s „k =e 1+1/s „k 
inal eK 4nd (1438 e). 217, ex (14 A). 
k=1 k=1 
Thus 
9, |? = ‘ 
E\op@—) < K'E (1+ mD? (2 i] < 00, 
i k=l 


which implies a finite variance for the score vector wre, 0€; /dv. Ling (2005) deduces the asymptotic 
normality of vnn — vo), even in the case EX? =o. 
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9.3.2 Self-Weighted QMLE 


Recall that, for the ARMA-GARCH models, the asymptotic normality of the QMLE has been 
established under the condition EX i < oo (see Theorem 7.5). To obtain an asymptotically normal 
estimator of the parameter go = (0, 0)’ of the ARMA-GARCH model (9.29) with weaker moment 
assumptions on the observed process, Ling (2007) proposed a self-weighted QMLE of the form 


n 


Ên = arg minn”! X oko), 
ge® =i 


where ĉ (9) = e2 (9) /&2(9) + log &?(9), using standard notation. To understand the principle 
of this estimator, note that the minimized criterion converges to the limit criterion I(g) = 
Eyo;£;(g) satisfying 


2 2 2 
of (~) | of (go) | {€ (0) — €,(Vo)} 
+5 —-1;+E — 
oo) PO) me oF 
27:01 (Po) {Er(F) — €:(Vo)} 
o? (o) l 


Ko) — po) = Egyor {tog 
+ Egy 


The last expectation (when it exists) is null, because 7; is centered and independent of the other 
variables. The inequality x — 1 > logx entails that 


oF (o) m oF (po) 
2 2 
of (po) of (p) 


OP) iog ECO | = 


— r} > Epo, {tog o 
Lo? Go) oP (o) 


Egy {tog 
Thus, under the usual identifiability conditions, we have I(~) > I(go), with equality if and only 
if p = gp. Note that the orthogonality between n; and the weights œ, is essential. Ling (2007) 
showed the convergence and asymptotic normality of ¢, under the assumption E|X,|* < oo for 
some s > 0. 


9.3.3 Lp Estimators 


The previous weighted estimator requires the assumption E nt < oo. Practitioners often claim that 
financial series admit few moments. A GARCH process with infinite variance is obtained either 
by taking large values of the parameters, or by taking an infinite variance for 7;. Indeed, for a 
GARCH(1, 1) process, each of the two sets of assumptions 


(i) ao: + Bor > 1, En? = 1, 
(ii) En? = œ 


implies an infinite variance for €,. Under (i), and strict stationarity, the asymptotic distribution of 
the QMLE is generally Gaussian (see Section 7.1.1), whereas the usual estimators have nonstandard 
asymptotic distributions under (ii) (see Berkes and Horvath, 2003b; Hall and Yao, 2003; Mikosch 
and Straumann, 2002), which causes difficulties for inference. As an alternative to the QMLE, it 
is thus interesting to define estimators having an asymptotic normal distribution under (ii), or even 
in the more general situation where both ag; + Bo; > 1 and E nt = œ are allowed. A GARCH 
model is usually defined under the normalization constraint E n? = 1. When the assumption that 
E n? exists is relaxed, the GARCH coefficients can be identified by imposing, for instance, that the 
median of n? is t = 1. In the framework of ARCH(q) models, Horváth and Liese (2004) consider 
Lp estimators, including the L; estimator 


Hy 


n q 
. -1 2 2 
arg minn Ot E; — O — AiE] 
0 t=1 i=1 
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where, for instance, œ; I> 1+5 GE tet. When n? admits a density, continuous and 


positive around its median t = 1, the consistency and asymptotic normality of these estimators are 
shown in Horváth and Liese (2004), without any moment assumption. 


9.3.4 Least Absolute Value Estimation 


For ARCH and GARCH models, Peng and Yao (2003) studied several least absolute deviations 
estimators. An interesting specification is 


arg minn™' ` |log e? — log õ? (0)| . (9.30) 

7 t=l 
With this estimator it is convenient to define the GARCH parameters under the constraint that 
the median of n? is 1. A reparameterization of the standard GARCH models is thus necessary. 
Consider, for instance, a GARCH(1, 1) with parameters w, a; and 6,, and a Gaussian noise nz. 
Since the median of n? is t = 0.4549..., the median of the square of n¥ = n,/,/T is 1, and the 


model is rewritten as 


2 2 2 
6 = Om, of =tw+ TAE + bofa. 


It is interesting to note that the error terms log ne = log é — log 6? (0) are iid with median 0 when 
0 = 6. Intuitively, this is the reason why it is pointless to introduce weights in the sum (9.30). 
Under the moment assumption Ee? < oo and some regularity assumptions, Peng and Yao (2003) 
show that there exists a local solution of (9.30) which is weakly consistent and asymptotically 
normal, with rate of convergence n!/?. This convergence holds even when the distribution of the 
errors has a fat tail: only the moment condition EF n? = ] is required. 


9.3.5 Whittle Estimator 


In Chapter 2 we have seen that, under the condition that the fourth-order moments exist, the square 
of a GARCH(p, q) satisfies the ARMA(max(p, q), q) representation 


bop (L)€? = wo + Ya (Lut, (9.31) 


where 


max(p.q) p 
da =1- J Cort Boze’, Ya) =1-} bozi, u = (n? — Vo?. 
i=1 i=1 
The spectral density of e is 
A A 2 2 
z 6, = Ew. 
27 lbo (e7)? 


foa A) = 


Let ~.2(h) be the empirical autocovariance of é at lag h. At Fourier frequencies A; = 2x j/n € 
(—z, 1], the periodogram 


x —ihh; ae n n 
m= g pahe, jet={[-S]+1....[ FI}, 
can be considered as a nonparametric estimator of 27 fg, (àj). Let 


go (L) 
Wo(L) 


ü) = fe? — wo '(} . 
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It can be shown that 


2 m 
ou (20) fA) a > 62(6), 
2m Jun fola) 


with equality if and only if 6 = 69 (see Proposition 10.8.1 in Brockwell and Davis, 1991). In view 
of this inequality, it is natural to consider the so-called Whittle estimator 


o2(0) := Eu? (0) = 


For ARMA with iid innovations, the Whittle estimator has the same asymptotic behavior as the 
QMLE and LSE (which coincide in that case). For GARCH models, the Whittle estimator still 
exhibits the same asymptotic behavior as the LSE, but it is generally less accurate than the QMLE. 
Moreover, Giraitis and Robinson (2001), Mikosch and Straumann (2002) and Straumann (2005) 
have shown that the consistency requires the existence of Æ ef , and that the asymptotic normality 
requires Ee? < œ. 


9.4 Bibliographical Notes 


The central reference of Sections 9.1 and 9.2 is Berkes and Horváth (2004), who give precise 
conditions for the consistency and asymptotic normality of the estimators 6,,,. Slightly different 
conditions implying consistency and asymptotic normality of the MLE can be found in Francq 
and Zakoian (2006b). Additional results, in particular concerning the interesting situation where 
the density f of the iid noise is known up to a nuisance parameter, are available in Straumann 
(2005). The adaptative estimation of the GARCH models is studied in Drost and Klaassen (1997) 
and also in Engle and Gonzdlez-Rivera (1991), Linton (1993), Gonzdlez-Rivera and Drost (1999) 
and Ling and McAleer (2003b). Drost and Klaassen (1997), Drost, Klaassen and Werker (1997), 
Ling and McAleer (2003a) and Lee and Taniguchi (2005) give mild regularity conditions ensuring 
the LAN property of GARCH. 

Several estimation methods for GARCH models have not been discussed here, among them 
Bayesian methods (see Geweke, 1989), the generalized method of moments (see Rich, Raymond 
and Butler, 1991), variance targeting (see Francq, Horvath and Zakoian, 2009) and robust methods 
(see Muler and Yohai, 2008). Rank-based estimators for GARCH coefficients (except the intercept) 
were recently proposed by Andrews (2009). These estimators are shown to be asymptotically 
normal under assumptions which do not include the existence of a finite fourth moment for the 
iid noise. 


9.5 Exercises 


9.1 (The score of a scale parameter is centered) 
Show that if f is a differentiable density such that f |x| f(x)dx < oo, then 


’ 1 
fpa) As (Z)ax=o. 
f (x/o1) OJ o Or 
Deduce that the score vector defined by (9.3) is centered. 


9.2 (Covariance between the square and the score of the scale parameter) 
Show that if f is a differentiable density such that f lx? f(x)dx < oo, then 


2 Fœ P 
fo =x") (1 +02) f(x)dx = 2. 
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9.3 


9.4 


9.5 


9.6 


9.7 


9.8 


9.9 
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(Intuition behind Le Cam’s third lemma) 
Let ¢o(x) = (2207)! exp(—(x — 0)? /20°) be the M(0,o?°) density and let the 
log-likelihood ratio 

po (x) 


po x) 


aX +b 
A (8, 00, X) 


when X ~ N(6o, 02), and then when X ~ M(9, 07). 


A (8, 09, x) = log 


Determine the distribution of 


(Fisher information) 
For the parametrization (9.11) on page 223, verify that 


2 


ade 


(Condition for the consistency of the MLE) 
Let ņ be a random variable with density f such that E|n|" < co for some r Æ 0. Show that 


log Ln, f,(00) > J as. as n > oo. 


Elogof (no) < E log f(n) Vo Al. 


(Case where the Laplace QMLE is optimal) 
Consider a GARCH model whose noise has the T (b, b) distribution with density 


lo) 
|x|?! exp (—b|x|), re = f x! exp (—x) dx, 
0 


where b > 0. Show that the Laplace QMLE is optimal. 


(Comparison of the MLE, QMLE and Laplace QMLE) 
Give a table similar to Table 9.5, but replace the Student ¢ distribution f, by the double 
T (b, p) distribution 


Pp 
2r (p) 


fop) = 2l’ exp (bix), T) = Í xP- exp (-x) dx, 


where b, p >Q. 


(Asymptotic comparison of the estimators În n) 
Compute the coefficient TA f defined by (9.28) for each of the instrumental densities h of 
Table 9.4. Compare the asymptotic behavior of the estimators 6), p. 


(Fisher information at a pseudo-true value) 
Consider a GARCH(p, q) model with parameter 


8o = (w0, 01, ---, 40g; Bor, +--+ Bop)’ 


1. Give an example of an estimator which does not converge to 99, but which converges to 
a vector of the form 


2 2 2 i 
0* = (9° wo, 0°do1, ---, 0°&0q, Bor... +s Bop)’ 
where ọ is a constant. 


2. What is the relationship between a? (60) and o? (6*)? 
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3. Let Ag = diag(o~7J,41, Ip) and 


1 d02 ðo? 
J(@) = E——_+—L (0). 
“e o;' 00 aor O 


Give an expression for J(6*) as a function of J (8o) and Ag. 


9.10 (Asymptotic distribution of the Laplace QMLE) 
Determine the asymptotic distribution of the Laplace QMLE when the GARCH model does 
not satisfy the natural identifiability constraint E|ņ;| = 1, but the usual constraint E ne =1. 


Part Ill 


Extensions and Applications 


10 


Asymmetries 


Classical GARCH models, studied in Parts I and II, rely on modeling the conditional variance as 
a linear function of the squared past innovations. The merits of this specification are its ability 
to reproduce several important characteristics of financial time series — succession of quiet and 
turbulent periods, autocorrelation of the squares but absence of autocorrelation of the returns, 
leptokurticity of the marginal distributions — and the fact that it is sufficiently simple to allow for 
an extended study of the probability and statistical properties. 

From an empirical point of view, however, the classical GARCH modeling has an important 
drawback. Indeed, by construction, the conditional variance only depends on the modulus of the 
past variables: past positive and negative innovations have the same effect on the current volatility. 
This property is in contradiction to many empirical studies on series of stocks, showing a negative 
correlation between the squared current innovation and the past innovations: if the conditional 
distribution were symmetric in the past variables, such a correlation would be equal to zero. 
However, conditional asymmetry is a stylized fact: the volatility increase due to a price decrease 
is generally stronger than that resulting from a price increase of the same magnitude. 

The symmetry property of standard GARCH models has the following interpretation in terms 
of autocorrelations. If the law of n; is symmetric, and under the assumption that the GARCH 
process is second-order stationary, we have 


Cov(o;,€—n) =90, h>O, (10.1) 


because o; is an even function of the €,_;, i > 0 (see Exercise 10.1). Introducing the positive and 
negative components of €;, 


e =max(e;,0), €, = min(E;, 0), 
it is easily seen that (10.1) holds if and only if 
Cov(es*, e-n) = Cov(e,,&-n) =0, h>O. (10.2) 


This characterization of the symmetry property in terms of autocovariances can be easily tested 
empirically, and is often rejected on financial series. As an example, for the log-returns series 
(e; = log(p;/p;—1)) of the CAC 40 index presented in Chapter 1, we get the results shown in 
Table 10.1. 
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Table 10.1 Empirical autocorrelations (CAC 40 series, period 1988—1998). 
h 1 2 3 4 5 10 20 40 


OCE, Ern) 0.030 0.005 —0.032 0.028 —0.046* 0.016 0.003 —0.019 
p(lerl, l€s-nl) 0.090" 0.100" 0.118" 0.099* 0.086" 0.118" 0.055" 0.032 
plet, €n) 0.011 —0.094" —0.148" —0.018 —0.127* —0.039* —0.026 —0.064" 


*indicate autocorrelations which are statistically significant at the 5% level, using 1/n as an approximation of 
the autocorrelations variances, for n = 2385. 


The absence of significant autocorrelations of the returns and the correlation of their modulus 
or squares, which constitute the basic properties motivating the introduction of GARCH models, 
are clearly shown for these data. But just as evident is the existence of an asymmetry in the impact 
of past innovations on the current volatility. More precisely, admitting that the process (€+) is 
second-order stationary and can be decomposed as €; = o;7;, where (n+) is an iid sequence and o; 
is a measurable, positive function of the past of €;, we have 


plet, €n) = KCov(o;, €r) = K[Cov(a;, e} ,) + Cov(o;, €;_;,)] 


where K > 0. For the CAC data, except when h = 1 for which the autocorrelation is not significant, 
the estimates of pet , €t-n) seem to be significantly negative.! Thus 


Cov(o;, €} p) < Cov(o;, =E), 


which can be interpreted as a higher impact of the past price decreases on the current volatility, 
compared to the past price increases of the same magnitude. This phenomenon, Cov (0+, €;—;,) < 0, 
is known in the finance literature as the leverage effect:? volatility tends to increase dramatically 
following bad news (that is, a fall in prices), and to increase moderately (or even to diminish) 
following good news. 

The models we will consider in this chapter allow this asymmetry property to be incorporated. 


10.1 Exponential GARCH Model 


The following definition for the exponential GARCH (EGARCH) model mimics that given for the 
strong GARCH. 


Definition 10.1 (EGARCH@, q) process) Let (n;) be an iid sequence such that E(n;) = 0 and 
Var(n;) = 1. Then (€+) is said to be an exponential GARCH (EGARCH(p, q)) process if it satisfies 
an equation of the form 


Et = Ott (10.3) 
logo? = œ + Vi, aig(m at pat Bi logo? j, , 
where 
Emi) = Oni + 5 (nil — Elnr-il), (10.4) 


and œ, Qi, Bj, 6 and ¢ are real numbers. 


'Recall, however, that for a noise which is conditionally heteroscedastic, the valid asymptotic bounds at 
the 95% significancy level are not +1.96/,/n (see Chapter 5). 

2 When the price of a stock falls, the debt—equity ratio of the company increases. This entails an increase 
of the risk and hence of the volatility of the stock. When the price rises, the volatility also increases but by a 
smaller amount. 
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Remark 10.1 (On the EGARCH model) 


1. The relation 


P p 
o? =e” gi exp{æ; g (M-i )} gi ag 


i=l j=l 


shows that, in contrast to the classical GARCH, the volatility has a multiplicative 
dynamics. The positivity constraints on the coefficients can be avoided, because the 
logarithm can be of any sign. 


2. According to the usual interpretation, however, innovations of large modulus should increase 
volatility. This entails constraints on the coefficients: for instance, if log of = w + 0n-1 + 
ç (Inl — Eln), G increases with |7,_,|, the sign of ņn,—ı being fixed, if and only if 
—¢ <0 < ç. In the general case it suffices to impose 


~¢<0<¢, 20, B20. 


3. The asymmetry property is taken into account through the coefficient @. For instance, let 
6 <0 and logo? =w+6n_1: if m1 <0 (that is, if €;_, < 0), the variable logo? will 
be larger than its mean œw, and it will be smaller if €,_; > 0. Thus, we obtain the typical 
asymmetry property of financial time series. 


4. Another difference from the classical GARCH is that the conditional variance is written as 
a function of the past standardized innovations (that is, divided by their conditional standard 
deviation), instead of the past innovations. In particular, log g is a strong ARMA (p, q — q’) 
process, where q’ is the first integer i such that a; # 0, because (g(n,)) is a strong white 
noise, with variance 


Varlg(n:)] = 0° + ¢?Var(|n:|) + 26¢Cov(n, Intl). 


5. A formulation which is very close to the EGARCH is the Log-GARCH, defined by 


g P 
logo? = w+ Xo; log(ler—il — Si€ri) + >D B; log OF ij, 
i=l j=l 


where, obviously, one has to impose |¢;| < 1. 


6. The specification (10.4) allows for sign effects, through @7;-;, and for modulus effects 
through ç (|7n-i| — E|nr-i|). This obviously induces, however, at least in the case q = 1, 
an identifiability problem, which can be solved by setting ¢ = 1. Note also that, to allow 
different sign effects for the different lags, one could make 6 depend on the lag index i, 
through the formulation 


Et = Otr 
logo? = w + X; œ; {bimi + (mil — Elm—il)} (10.5) 
+ 2i Bj logo; j. 


As we have seen, specifications of the function g(-) that are different from (10.4) are possible, 
depending on the kind of empirical properties we are trying to mimic. The following result does 
not depend on the specification chosen for g(-). It is, however, assumed that F'g(n;) exists and is 
equal to 0. 


Theorem 10.1 (Stationarity of the EGARCH(p, q) process) Assume that g(n;) is not almost 
surely equal to zero and that the polynomials a(z) = yj «iz! and B(z) = 1 — P Biz! have no 
common root, with æ (z) not identically null. Then, the EGARCH( p, q) model defined in (10.3) admits 
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a Strictly stationary and nonanticipative solution if and only if the roots of B(z) are outside the unit 


circle. This solution is such that E (log Eer < œ whenever E (log 127 < œ and Eg” (n) xpo; 


If, in addition, 


[| Zexpllaigin i} < œ, (10.6) 


i=1 


where the i; are defined by a(L)/B(L) = Yi AiL', then (€;) is a white noise with variance 


Ele) = E(o?) =e" [an Qa, 


i=1 


where w* = w/B(1) and g(x) = Elexp{xg(n)}]. 


Proof. We have log e = log ua + log TA Because log oa is the solution of an ARMA(p, q — 1) 
model, with AR polynomial £, the assumptions made on the lag polynomials are necessary and 
sufficient to express, in a unique way, log oa? as an infinite-order moving average: 


oo 
logo? =or+ Yo aigi), a.s. 
i=l 
It follows that the processes (log a?) and (log e?) are strictly stationary. The process (log a7) is 
second-order stationary and, under the assumption £E (log ny < 00, so is (log g): Moreover, using 
the previous expansion, 


lo) 
e = opn =e gi exp{àig(m-i)}n?, a.s. (10.7) 
i=l 


Using the fact that the process g(7;) is iid, we get the desired result on the expectation of (e?) 
(Exercise 10.4). 


Remark 10.2 


1. When $; =0 for j = 1,..., p (EARCH(q) model), the coefficients A; cancel for i >q. 
Hence, condition (10.6) is always satisfied, provided that E exp{|a;g(n;)|} < co, for i = 
1,...,q. If the tails of the distribution of n; are not too heavy (the condition fails for the 
Student ¢ distributions and specification (10.4)), an EARCH(q) process is then stationary, 
in both the strict and second-order senses, whatever the values of the coefficients œj. 


2. When n; is M(0, 1) distributed, and if g(-) is such that (10.4) holds, one can verify 
(Exercise 10.5) that 


log E exp {|Aig(7)| = O(i). (10.8) 


Since the A; are obtained from the inversion of the polynomial 6(-), they decrease expo- 
nentially fast to zero. It is then easy to check that (10.6) holds true in this case, without any 
supplementary assumption on the model coefficients. The strict and second-order stationar- 
ity conditions thus coincide, contrary to what happened in the standard GARCH case. To 
compute the second-order moments, classical integration calculus shows that 


2 22 (0 + ç)? 
8n (ài) = exp [u2] [e | | {à (9+ ¢)} 


à (0 — ç)? 
tapfe 2} ors on). 


where ® denotes the cumulative distribution function of the V(0, 1). 
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Theorem 10.2 (Moments of the EGARCH(p, q) process) Let m be a positive integer. Under 
the conditions of Theorem 10.1 and if 
oo 
Ham = E(n;") <00, | | Eexp{lmajg(m)|} < 00, 
i=1 
(e?) admits a moment of order m given by 


oo 
E(e") = pome” TEO 


i=1 


Proof. The result straightforwardly follows from (10.7) and Exercise 10.4. 


The previous computation shows that in the Gaussian case, moments exist at any order. This shows 
that the leptokurticity property may be more difficult to capture with EGARCH than with standard 
GARCH models. 

Assuming that FE (log ney? < oo, the autocorrelation structure of the process (log e?) can be 
derived by taking advantage of the ARMA form of the dynamics of log gz: Indeed, replacing the 


terms in log o; by log ej — log iij. we get 


4 P P 
loge? =w + logn? +Y aign) +Y Bj loge?_; — Y Bj logn. 
i=l j=l j=l 


Let 


P q P 
v, = loge? — ) Bj loge?_j = œ+ logn, + > aigi) — Y Bj logn; 
j=l i=l ia} 


One can easily verify that (v;) has finite variance. Since v; only depends on a finite number r 
(r = max(p,q)) of past values of 7;, it is clear that Cov(v;, v;-,) = 0 for k >r. It follows that 
(v;) is an MA(r) process (with intercept) and thus that (log e?) is an ARMA (p, r) process. This 
result is analogous to that obtained for the classical GARCH models, for which an ARMA (r, p) 
representation was exhibited for e. Apart from the inversion of the integers r and p, it is important 
to note that the noise of the ARMA equation of a GARCH is the strong innovation of the square, 
whereas the noise involved in the ARMA equation of an EGARCH is generally not the strong 
innovation of log é?. Under this limitation, the ARMA representation can be used to identify the 
orders p and q and to estimate the parameters 6; and a; (although the latter do not explicitly 
appear in the representation). 

The autocorrelations of (e?) can be obtained from formula (10.7). Provided the moments exist 
we have, for h > 0, 


h-1 
Elce) =E fe | | exptàig (n-}n?n?-n expng(n—n)} 


i=1 


x I] el tapeta] 


i=h+1 


ll 
a 


h-1 oo 
a i soo] EON exp{àrg(n-a)}) gi gn (Ai + ài-n), 


i=l i=h+1 
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the first product being replaced by 1 if h = 1. For h > 0, this leads to 


h—1 fore) fore) 
Cov(e?, €f p) =e” TI BADE n expanso) |] ede tr) [Tle 0" 
i=1 i=h+1 i=1 


10.2 Threshold GARCH Model 


A natural way to introduce asymmetry is to specify the conditional variance as a function of the 
positive and negative parts of the past innovations. Recall that 


et = max(é;, 0), €, = min(e, 0) 


and note that €; = ef +e, . The threshold GARCH (TGARCH) class of models introduces a 
threshold effect into the volatility. 


Definition 10.2 (TGARCH(, q) process) Let (n;) be an iid sequence of random variables such 
that E(n,) = 0 and Var(n;) = 1. Then (€;) is called a threshold GARCH(p, q) process if it satisfies 
an equation of the form 


Et = Ort 
z 10.9) 
| o = Ot Vy Ope; AiE + am Pijo j, i 


where w, Œi +, &i,— and B; are real numbers. 


Remark 10.3 (On the TGARCH model) 
1. Under the constraints 


w>0, 420, a >0, 6 =0, (10.10) 


the variable o; is always strictly positive and represents the conditional standard deviation 
of €,. In general, the conditional standard deviation of €; is |o;|: imposing the positivity of 
o; is not required (contrary to the classical GARCH models, based on the specification 
of of). 


2. The GJR-GARCH model (named for Glosten, Jagannathan and Runkle, 1993) is a variant, 
defined by 


q P 
2, 2 2 2 
o =0+ X AiCi i + ViCi lei > 0) + X Bjo; , 
= = 


which corresponds to squaring the variables involved in the second equation of (10.9). 


3. Through the coefficients œ; and œ; —, the current volatility depends on both the mod- 
ulus and the sign of past returns. The model is flexible, allowing the lags i of the past 
returns to display different asymmetries. Note also that this class contains, as special cases, 
models displaying no asymmetry, whose properties are very similar to those of the standard 
GARCH. Such models are obtained for œ; + = œ; — := a; (i = 1,...,q) and take the form 


q P 
o =w+ X aileil on X Bio) 
i=l j=l 


(since |e,| = €;* — e7 ). This specification is called absolute value GARCH (AVGARCH). 
Whether it is preferable to model the conditional variance or the conditional standard 
deviation is an open issue. However, it must be noted that for regression models with 
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non-Gaussian and heteroscedastic errors, one can show that estimators of the noise variance 
based on the absolute residuals are more efficient than those based on the squared residuals 
(see Davidian and Carroll, 1987). 


Figure 10.1 depicts the major difference between GARCH and TGARCH models. The so- 
called ‘news impact curves’ display the impact of the innovations at time ¢ — 1 on the volatility at 
time f, for first-order models. In this figure, the coefficients have been chosen in such a way that 
the marginal variances of €, in the two models coincide. In this TARCH example, in accordance 
with the properties of financial time series, negative past values of €;_; have more impact on the 
volatility than positive values of the same magnitude. The impact is, of course, symmetrical in the 
ARCH case. 

TGARCH models display linearity properties similar to those encountered for the GARCH. 
Under the positivity constraints (10.10), we have 


e =oan, €&€ =aMN,, (10.11) 


which allows us to write the conditional standard deviation in the form 
max{p,q} 


a= ot J, aiM) (10.12) 


i=1 


where aj(z) = œi +z" —aj;,-z + Bi, i=1,...,max{p,q}. The dynamics of o, is thus given by 
a random coefficient autoregressive model. 


Stationarity of the TGARCH(1, 1) Model 


The study of the stationarity properties of the TGARCH(1, 1) model is based on (10.12) and follows 
from similar arguments to the GARCH(1, 1) case. The strict stationarity condition is written as 


Ellog(ay,4n; — a1,-n, + B1)] < 0. (10.13) 


a 
TA 


aS 
fo) 


| 
-10 -5 5 10 Er 


Figure 10.1 News impact curves for the ARCH(1) model, €e, = ,/1 + 0.38€? n (dashed line), 
and TARCH(1) model, €, = (1 — 0.5€;_; + 0.26* Dn (solid line). 
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In particular, for the TARCH(1) model (6; = 0) we have 


log(æi 417 — a1,—-n; ) = log(a1,+) l, > 0) + log(@1,—) Mn, <0} + log |n]. 


Hence, if the distribution of (7;) is symmetric the expectation of the two indicator variables is 
equal to 1/2 and the strict stationarity condition reduces to 


Oy 4a < oe 2F login 


Exercise 10.8 shows that the second-order stationarity condition is 
E[n — o-n + Bi) <1. (10.14) 


This condition can be made explicit in terms of the first two moments of n} and n; . For instance, 
if n, is N(O, 1) distributed, we get 
1 261 2 
O A E a E E + B? <1. 10.15) 
z Ola TA, ) Jin’ + )+ Bi ( 
Of course, the second-order stationarity condition is more restrictive than the strict stationarity 
condition (see Figure 10.2). 
Under the second-order stationarity condition, it is easily seen that the property of symmetry 


(10.1) is generally violated. For instance if the distribution of 7; is symmetric, we have, for the 
TARCH(1) model: 


Cov(o;, €:-1) = 01,4 E(€",)? — o1,-E(e,)? = (a1,4 — aE #0 


whenever œi, + Æ @1,—. 


Figure 10.2 Stationarity regions for the TARCH(1) model with n, ~ M(0, 1): 1, second-order 
stationarity; 1 and 2, strict stationarity; 3, nonstationarity. 


ASYMMETRIES 253 


Strict Stationarity of the TGARCH(p, q4) Model 

The study of the general case relies on a representation analogous to (2.16), obtained by replacing, in 
the vector z,, the variables €? ; by (€; ;, —€;_;)', the oĉ ; by o;_;, and by an adequate modification 
of b, and A;. Specifically, using (10.11), we get 


Z, = b + Arz,_ 1, (10.16) 
where 4 
on? Er 
=o; =E; 
0 
: at 
b, =bn) = j e Re, z= i e RPH, 
w Ei—q+1 
0 Oo; 
0 Ot—p+1 
and 
Nj O}:q—1 aq, +F æq,- n Bi: p1 Bane 
= L1:g-1 Og 4M Aq,- N; Bip) Bp 
A; = l-2 024-2 O2g—2 Org-2xp-1 0247-2 (10.17) 
Q@1:q-1 Ag+ Ag,— Bye p-1 Bp 
Op—1x2q-2 Op-1 Op-1 Ip-1 Op-1 


is a matrix of size (p + 2q) x (p + 2q), 


Qi:g-1 = (æi, Q1, —, -> Ag—-14, @qg—1,—) € R47, 
Bi: p-1 > (Bi, ---, Bp-1) € R?!, 


The following result is analogous to that obtained for the strict stationarity of the 
GARCH(p, q). 


Theorem 10.3 (Strict stationarity of the TGARCH(p, q) model) A necessary and suffi- 
cient condition for the existence of a strictly stationary and nonanticipative solution of the 
TGARCH(p, q) model (10.9)—(10.10) is that y < 0, where y is the top Lyapunov exponent of the 
sequence {A,,t € Z} defined by (10.17). 

This stationary and nonanticipative solution, when y < 0, is unique and ergodic. 


Proof. The sufficient part of the proof of Theorem 2.4 can be straightforwardly adapted. As for 
the necessary part, note that the coefficients of the matrices A,, b, and z, are positive. This allows 
us to show, as was done previously, that Ag... A_,b_,_, tends to 0 almost surely when k — oo. 
But since b_,_, = on! ,_ el — On_,_;€2 + M2941, using the positivity, we have 

lim Ao... A-~on*,_1e1 = lim Ao... A_x@n_,_e2 

k->0oo k->0o0 

= lim Ago dais A_,@€2q41 = 0, a.S. 
k—>o0 


It follows that limy... Ao... A—ge; = 0 a.s. for i = 1,...2q + 1 by induction, as in the GARCH 
case. 
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Numerical evaluation, by means of simulation, of the Lyapunov coefficient y can be time- 
consuming because of the large size of the matrices A;. A condition involving matrices of 
smaller dimensions can sometimes be obtained. Suppose that the asymmetric effects have a 
factorization the form a;_ = «æ; for all lags i= 1,...,q. In this constrained model, the 
asymmetry is summarized by only one parameter 0 Æ 1, the case 0 > 1 giving more importance 
to the negative returns. 


Theorem 10.4 (Strict stationarity of the constrained TGARCH(p, q) model) A necessary 
and sufficient condition for the existence of a strictly stationary and nonanticipative solution of 
the TGARCH(p, q) model (10.9), in which the coefficients œ, d; — and œ; satisfy the positivity 
conditions (10.10) and the q — 1 constraints 


== 0014, O2- = 0024, ..., A, =O0g,4, 


is that y* < 0, where y* is the top Lyapunov exponent of the sequence of (p +q — 1) x (p+q — 1) 
matrices {A;,t € Z} defined by 


0 «a g O -1) 0 -0 
I,-2 Og—2x p+1 
de 4 q-2xp , 10.18 
P= Tay agi 0400-1 + Bi B Bp = 
Op-1xq-1 Ip-1 Op-1 


where O(n) = n} — 0n; . This stationary and nonanticipative solution, when y* < 0, is unique 
and ergodic. 


Proof. If the constrained TGARCH model admits a stationary solution (0;, €+), then a stationary 
solution exists for the model 


a = b; + Azis (10.19) 
where $ a 
Eim (Z2 
04-1 ét — Oe” 
b% = w € Reta! Z= t—q+1 t—q+1 eRT], 
ied 0 ; =t Or 


Ot—p+1 


Conversely, if (10.19) admits a stationary solution, then the constrained TGARCH model admits 
the stationary solution (or, €r) defined by o; = z(q) (the gth component of z7) and €; = o;N. 
Thus the constrained TGARCH model admits a strictly stationary solution if and only if model 
(10.19) has a strictly stationary solution. It can be seen that limz—>oo Aù --- A* ¿b* -1 = 0 implies 
limk>o Ağ- A*,e; = 0 fori = 1,..., p+ q — 1, using the independence of the matrices in the 
product Aj---A*,b*,_, and noting that, in the case where 0(n;) is not almost surely equal to 
zero, the gth component of eae the first and (q + 1)th components of A* bii the second 
and (q + 2)th components of A*,,,A*,b*,_1, etc., are strictly positive with nonzero probability. 
In the case where 6(n;) = 0, the first q — 1 rows of Aj -+ AŽ 442 are null, which obviously shows 
that limy_,., Ağ + AŽ ei = 0 for i = 1,...,q — 1. For i = q, ..., p +q — 1, the argument used 
in the case 0 (n) # 0 remains valid. The rest of the proof is similar to that of Theorem 2.4. 
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mth-Order Stationarity of the TGARCH(p, q) Model 


Contrary to the standard GARCH model, the odd-order moments are not more difficult to obtain 
than the even-order ones for a TGARCH model. The existence condition for such moments is 
provided by the following theorem. 


Theorem 10.5 (mth-order stationarity) Let m be a positive integer. Suppose that E(\n;|'") < 
oo. Let A™ = E(A®") where A, is defined by (10.16). If the spectral radius 


p(A™) <1, 


then, for any t € Z, the infinite sum (z,) is a strictly stationary solution of (10.16) which con- 
verges in L™ and the process (€+), defined by €; = 541% ne is a Strictly stationary solution of the 
TGARCH(p, q) model defined by (10.9), and admits moments up to order m. 

Conversely, if p(A™) > 1, there exists no strictly stationary solution (€;) of (10.9) satisfying 
the positivity conditions (10.10) and the moment condition E (\é;|") < œ. 


The proof of this theorem is identical to that of Theorem 2.9. 


Kurtosis of the TGARCH(1, 1) Model 


For the TGARCH(1, 1) model with positive coefficients, the condition for the existence of E |e;,|” 
can be obtained directly. Using the representation 


o, =ota(mi)%-1, a(n) = æi 4N" —a-n + fi, 


we find that Eo;” exists and satisfies 


m 


Eo” Z > ck ao ba” "Ee 


m 
k=0 


if and only if 
Ea” (n) <1. (10.20) 


If this condition is satisfied for m = 4, then the kurtosis coefficient exists. Moreover, if n, ~ N(0, 1) 
we get 

Eo; 
(Eo;)?’ 


Ke = 


and, using the notation a; = Ea'(n,), the moments can be computed successively as 


1 
a = Tar (æi +a1,-) + Bi, 
w 
Eo; = , 
=i 


Bi (1+ +a1,-) + BP, 
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2 3 3 
a3 = [ei +a} _)+ 5h (of, taz_)+ ahi (ai + o1,-) + bi, 


3 w {1 2a, + 2a. + ajar} 
Eo; = ———— ~, 
{1 — ay} {1 — ao} {1 — a3} 


2 
au = 2 (at, tof) +4 =p, (af, tai) +i 


4 
+36? (aj, +a? _) + =f} (a1,4 +a1,-), 


~y 20 
Eo! œt {1 + 3a, + 5a + 3a,a2 + 3a3 + Saya3 + 3a2a3 + a) a2a3} 
of = so, 
' {1 — ay} {1 — a} {1 — a3} 


Many moments of the TGARCH(1, 1) can be obtained similarly, such as the autocorrelations of 
the absolute values (Exercise 10.9) and squares, but the calculations can be tedious. 


10.3 Asymmetric Power GARCH Model 


The following class is very general and contains the standard GARCH, the TGARCH, and the 
Log-GARCH. 


Definition 10.3 (APARCH(, q) process) Let (ņn;) be a sequence of iid variables such that 
E(nr) = 0 and Var(nr) = 1. The process (€;) is called an asymmetric power GARCH(p, q)) if it 
satisfies an equation of the form 


| TE (10.21) 


of =w +} L] o; (leil — Siei)? + aa Piok; 
where œw >Q, 5>0, a; > 0, B; > 0 and |¢;| < 1. 
Remark 10.4 (On the APARCH model) 


1. The standard GARCH(p, q) is obtained for 6 = 2 and çı =--- = çq = 0. 


2. To study the role of the parameter ¢;, let us consider the simplest case, the asymmetric 
ARCH(1) model. We have 


2 | w+ a(l = gi)? if e1 2 0, (10.22) 


7r T | otal t gi)? if ei <0. 


Hence, the choice of ç; >0 ensures that negative innovations have more impact on the 
current volatility than positive ones of the same modulus. Similarly, for more complex 
APARCH models, the constraint ç; > 0 is a natural way to capture the typical asymmetric 
property of financial series. 


3. Since 


55 ô 55 
aill E si|" e i; = ilsi |1 = 1/sil €i, 
[si| < 1 is a nonrestrictive identifiability constraint. 


4. If ô = 1, the model reduces to the TGARCH model. Using log o; = lims_.0(o? — 1)/6, one 
can interpret the Log-GARCH model as the limit of the APARCH model when ô — 0. 
The novelty of the APARCH model is in the introduction of the parameter 5. Note that 
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autocorrelations of the absolute returns are often larger than autocorrelations of the squares. 
The introduction of the power ô increases the flexibility of GARCH-type models, and allows 
the a priori selection of an arbitrary power to be avoided. 


Noting that {€,_; > 0} = {n;i > 0}, one can write 
max{ p,q} 
op =ot Yo amido} (10.23) 
i=l 
where 
a;(z) = aj (|z| — sz)? + Pi 
= aj (1 — slz? Iz> 0) tai + 6)" lal? Meo +8): 


fori = 1,...,max{p, q}. 


Stationarity of the APARCH(1, 1) Model 


Relation (10.23) is an extension of (2.6) which allows us to obtain the stationarity conditions, as in 
the classical GARCH(1, 1) case. The necessary and sufficient strict stationarity condition is thus 


E log{ai(1 — 61)°|nel° Mn, = 0) +01 + 51)° lel? Mn, <0) +81} < 0. (10.24) 
For the APARCH(1, 0) model, we have 


log{æ (1 — 61)? lnel® Mn, +0) +01 + 61)? lel? Wn, <0} 


= log(1 — 61)? Mtn, = 0) + log(1 + 1)? Un, <0) + log ai 171° 


showing that, if the distribution of (7;) symmetric, the strict stationarity condition reduces to 
= peg ô 
t= gi? elem < eF PEN, 
Note that in the limit case where |çı| = 1, the model is strictly stationary for any value of a, as 


might be expected. Under condition (10.24), the strictly stationary solution is given by 


CO 
& =orm, of =@+ >) am) a (m—Ky1)o. 
k=1 


Assuming E|7;|° < oo, the condition for the existence of E eê (and of Eo?) is 


Ea\(m) =o {0 — s)? En? Up, = 0) +0 + SDE? Up, <o)} + Bi < 1, (10.25) 


which reduces to 1 
zE a (d+)? +a- s)? } + <1 


when the distribution of (7,;) symmetric, with 


28 14+6 
Eln? = ia r (=) 


when 7; is Gaussian (I denoting the Euler gamma function). Figure 10.3 shows the strict and 
second-order stationarity regions of the APARCH(1, 0) model when 7; is Gaussian. 
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Figure 10.3 Stationarity regions for the APARCH(1,0) model with n; ~ M(0, 1): 1, second-order 
stationarity; 1 and 2, strict stationarity; 3, nonstationarity. 


Obviously, if 6 > 2 condition (10.25) is sufficient (but not necessary) for the existence of a 
strictly stationary and second-order stationary solution to the APARCH(1, 1) model. If ô < 2, con- 
dition (10.25) is necessary (but not sufficient) for the existence of a second-order stationary solution. 


10.4 Other Asymmetric GARCH Models 


Among other asymmetric GARCH models, which we will not study in detail, let us mention 
the qualitative threshold ARCH (QTARCH) model, and the quadratic GARCH model (QGARCH 
or GQARCH), generalizing Example 4.2 in Chapter 4. The first-order model of this class, the 
QGARCH(1, 1), is defined by 


Et = Orn, o? =04 wey + Sér 4 Boo, (10.26) 


where n; is a strong white noise with unit variance. 


Remark 10.5 (On the QGARCH(1,1) model) 


1. The function x wx? + çx has its minimum at x = —¢/2a, and this minimum is —¢?/4a. 
A condition ensuring the positivity of of is thus œ > —ç?/4æ. This can also be seen by 
writing 


oF =o — ç’ /4a + (Vaeni + ¢/2./a)" + Bo? 4. 


2. The condition 
a+ fi <1 


is clearly necessary for the existence of a nonanticipative and second-order stationary solu- 
tion, but it seems difficult to prove that this condition suffices for the existence of a solution. 
Equation (10.26) cannot be easily expanded because of the presence of ey = a; i and 
€+—-1 = 0;-1-1. It is therefore not possible to obtain an explicit solution as a function of 
the ,-;. This makes QGARCH models much less tractable than the asymmetric models 


studied in this chapter. 
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3. The asymmetric effect is taken into account through the coefficient ¢. A negative coefficient 
entails that negative returns have a bigger impact on the volatility of the next period than 
positive ones. A small price increase, such that the return is less than —¢/2a@ with ç > 0, 
can even produce less volatility than a zero return. This is a distinctive feature of this 
model, compared to the EGARCH, TGARCH or GJR-GARCH for which, by appropriately 
constraining the parameters, the volatility at time t is minimal in the absence of price 
movement at time ¢ — 1. 


Many other asymmetric GARCH models have been introduced. Complex asymmetric responses 
to past values may be considered. For instance, in the model 


or =at+ale—-1| lie iey} toper—i Hei >y} —@-€r-1 We, <-yy, Œ, æ+, æ, y >O, 


asymmetry is only present for large innovations (whose amplitude is larger than the threshold y). 


10.5 A GARCH Model with Contemporaneous 
Conditional Asymmetry 


A common feature of the GARCH models studied up to now is the decomposition 
Et = Ott, 


where o; is a positive variable and (7;) is an iid process. The various models differ by the 
specification of o, as a measurable function of the €;—; for i > 0. This type of formulation implies 
several important restrictions: 


(i) The process (e+) is a martingale difference. 
(ii) The positive and negative parts of e, have the same volatility, up to a multiplicative factor. 


(iii) The kurtosis and skewness of the conditional distribution of €, are constant. 


Property (ii) is an immediate consequence of the equalities in (10.11). Property (iii) expresses 
the fact that the conditional law of e; has the same ‘shape’ (symmetric or asymmetric, unimodal 
or polymodal, with or without heavy tails) as the law of nz. 

It can be shown empirically that these properties are generally not satisfied by financial time 
series. Estimated kurtosis and skewness coefficients of the conditional distribution often present 
large variations in time. Moreover, property (i) implies that Cov(e;, z;-1) = 0, for any variable 
z,-1 € L? which is a measurable function of the past of €;. In particular, one must have 


Vh>0, Cov(e,, € ,) = Cove, €) =0 (10.27) 
or, equivalently, 


Wh>0, Cov(e;*,€*,) = Cov(—e, ,e*,), Cov(e*,e7,) = Cov(—e, , €,_,). (10.28) 
We emphasize the difference between (10.27) and the characterization (10.2) of the asymmetry 
studied previously. When (10.27) does not hold, one can speak of contemporaneous asymmetry 
since the variables e and —e, , of the current date, do not have the same conditional distribution. 

For the CAC index series, Table 10.2 completes Table 10.1, by providing the cross empirical 
autocorrelations of the positive and negative parts of the returns. 
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Table 10.2 Empirical autocorrelations (CAC 40, for the period 1988-1998). 
h 1 2 3 4 5 10 20 40 


pler, e} ,) 0.037 —0.006 —0.013 0.029 —0.039* 0.017 0.023 0.001 
p(—e e} ,) —0.013 —0.035 —0.019 —0.025 —0.028 —0.007 —0.020 0.017 
pleš, —e n) 0.026 0.088" 0.135" 0.0477 0.0887 0.056” 0.049” 0.065" 

PCE, Ep) 0.060" 0.074" 0.041* 0.070* 0.027 0.077* 0.015 —0.008* 


*indicate parameters that are statistically significant at the level 5%, using 1/n as an approximation for the 
autocorrelations variance, for n = 2385. 


Without carrying out a formal test, comparison of rows 1 and 3 (or 2 and 4) shows that the 
leverage effect is present, whereas comparison of rows 3 and 4 shows that property (10.28) does 
not hold. 

A class of GARCH-type models allowing the two kinds of asymmetry is defined as follows. Let 


G= Orgie +o,-n,, tEZzZ, (10.29) 


where {n;} is centered, n; is independent of o; and o;,_, and 


— E S Ts E E a E P T NE SE B= é 

| Or = Q0, + Joiz OF Oyj — AUE j=1 Bj +01- + + Êj 401,- 
= Lst D e yP + 4 8- ; 

Ot — = Q0, jay aE T A E j=1 Bj, O1r-j4 Bjo- j,- 


+ „+ - 
where Oj OF eee, B 


E(nf{) = E(-n;) = 1. 
As an immediate consequence of the positivity of o;,4 and o;,_, we obtain 


> 0, ao,+, &o,— > 0. Without loss of generality, it can be assumed that 


e} =o} and E =o,-1,, (10.30) 


which will be crucial for the study of this model. 

Thus, o; and o, — can be interpreted as the volatilities of the positive and negative parts 
of the noise (up to a multiplicative constant, since we did not specify the variances of n7 and 
n; ). In general, the nonanticipative solution of this model, when it exists, is not a martingale 
difference because 


E(e& | &-1,...) = (O14 — o, —)E NP) #0. 


An exception is of course the situation where the parameters of the dynamics of o, and o; — 
coincide, in which case we obtain model (10.9). 
A simple computation shows that the kurtosis coefficient of the conditional law of €; is given by 


4 - = 
Eio ( . ) of on clk, 4 — k) 
a a _ (10.31) 


2 2 k= 
2e p ) OE4- c(k, 2 —k) 


where c(k, 1) = E[{nj — E(nf)} {n7 — E(u; )}'], provided that E (n4) < oo. A similar computa- 
tion can be done for the conditional skewness, showing that the shape of the conditional distribution 
varies in time, in a more important way than for classical GARCH models. 

Methods analogous to those developed for the other GARCH models allow us to obtain exis- 
tence conditions for the stationary and nonanticipative solutions (references are given at the end 
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of the chapter). In contrast to the GARCH models analyzed previously, the stationary solution (€+) 
is not always a white noise. 


10.6 Empirical Comparisons of Asymmetric GARCH 
Formulations 


We will restrict ourselves to the simplest versions of the GARCH introduced in this chapter, and 
consider their fit to the series of CAC 40 index returns, 7;, over the period 1988—1998 consisting 
of 2385 values. 


Descriptive Statistics 


Figure 10.4 displays the first 500 values of the series. The volatility clustering phenomenon is 
clearly evident. The correlograms in Figure 10.5 indicate absence of autocorrelation. However, 
squared returns present significant autocorrelations, which is another sign that the returns are 
not independent. Ljung—Box portmanteau tests, such as those available in SAS (see Table 10.3; 
Chapter 5 gives more details on these tests), confirm the visual analysis provided by the correl- 
ograms. The left-hand graph of Figure 10.6, compared to the right-hand graph of Figure 10.5, 
seems to indicate that the absolute returns are slightly more strongly correlated than the squares. 
The right-hand graph of Figure 10.6 displays empirical correlations between the series |r;| and 
ri-n. It can be seen that these correlations are negative, which implies the presence of leverage 
effects (more accentuated, apparently, for lags 2 and 3 than for lag 1). 
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Figure 10.4 The first 500 values of the CAC 40 index (left) and of the squared index (right). 
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Figure 10.5 Correlograms of the CAC 40 index (left) and the squared index (right). Dashed lines 
correspond to +1.96/,/n. 
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Table 10.3 Portmanteau test of the white noise hypothesis for the CAC 40 series (upper panel) 
and for the squared index (lower panel). 


Autocorrelation Check for White Noise 


To Chi- Pr > 
Lag Square DF Khi2 ----------------- Autocorrelations----------------- 
6 LISE 6 0.0737 0.030 0.005 -0.032 0.028 -0.046 -0.001 
12 16.99 12 0.1499 -0.018 -0.014 0.034 0.016 0.017 0.010 
18 2022 18 0.2685 -0.005 0.025 =0:..032 -0.009 -0.003 0.006 
24 27 020 24 0.2954 -0.023 0.003 -0.010 0.030 =0. 047 -0.015 
Autocorrelation Check for White Noise 
To Chi- Pr > 
Lag Square DF Khi2 ----------------- Autocorrelations----------------- 
6 165.90 6 <.0001 0..129 0.127 0: 127 0.084 0.101 0.074 
12 222.93 12 <. 0001 0.051 0.060 0.070 0.092 0.058 0.030 
18 PEN. DE AN 18 <.0001 0.053 0.036 0.020 0.041 0.002 0.013 
24 240.04 24 <.0001 0.006 0.024 0.2013 0.003 0.001 -0.002 
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Figure 10.6 Correlogram ht A(|r;|, |7:;-n|) of the absolute CAC 40 returns (left) and cross 
correlograms h +> ĝÔ(|ril, r-n) measuring the leverage effects (right). 


Fit by Symmetric and Asymmetric GARCH Models 


We will consider the classical GARCH(1, 1) model and the simplest asymmetric models (which 
are the most widely used). Using the AUTOREG and MODEL procedures of SAS, the estimated 
models are: 


GARCH(1, 1) model 


rn =5x10* + &, €& = Or, m ~ N(O, 1) 


(2x 1074) 
o? = 8x 10-6 + 0.09 e2] + 0.84 o2, (10.32) 


(2x 10-6) (0.02) (0.02) 
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EGARCH(1, 1) model 


r, = 4x 10+ + &, & =O, n, ~ NO, 1) 


(2x 10-4) 
logo? = —0.64 + 0.15 ( —0.53 m-1+|m-1) — /2/7) (10.33) 
(0.15) (0.03) (0.14) ` 
+ 0.93 logo? , 
(0.02) 
QGARCH(1, 1) model 
r= 3xl0* + ée, G@=am, m~NO,1) 
(2.1074) 
of = 9x10 + 007 6, = 9x10 G1 + 085 of, 
(2.1076) (0.01) (2x 1074) (0.03) 
(10.34) 
GJR-GARCH(1, 1) model 
r =4x10 + é, €t = 0t, m~ MO, 1) 
(2x1074) 
of =1x10% + 0.13 e; — 010 © ilee ,>0 + 0.84 o7, 
(2x 107°) (0.02) (0.02) (0.03) 
(10.35) 
TGARCH(1, 1) model 
re = 4x104 + é, Gg@=am, m~NO,1) 
(2x1074) 
o = 8x10 + 0.03 e$; — 012 e; + 0.87 onı (Aao) 
(2x1074) (0.01) (0.02) (0.02). 


Interpretation of the Estimated Coefficients 


Note that all the estimated models are stationary. The standard GARCH(1, 1) admits a fourth-order 
moment since, in view of the computation on page 45, we have 3a? + 8? + 2a6 < 1. It is thus 
possible to compute the variance and kurtosis in this estimated model (which are respectively equal 
to 1.3 x 1074 and 3.49 for the standard GARCH(1, 1)). Given the ARMA(1, 1) representation for 
e?, we have p.2(h) = (ê + B)p,.2(h — 1) for any h > 1. Since & + Ê = 0.09 + 0.84 is close to 1, 
the decay of p,2(h) to zero will be slow when h — ov, which can be interpreted as a sign of 
strong persistence of shocks. 

Note that in the EGARCH model the parameter 0 = —0.53 is negative, implying the presence 
of the leverage effect. A similar interpretation can be given to the negative sign of the coefficient 
of €,-; in the QGARCH model, and to that of e] lie >0) in the GJR-GARCH model. In the 
TGARCH model, the leverage effect is present since œ,- > a1, >0. 

The TGARCH model seems easier to interpret than the other asymmetric models. The volatility 
(that is, the conditional standard deviation) is the sum of four terms. The first is the intercept 
w = 8 x 1074. The term w/(1 — 61) = 0.006 can be interpreted as a ‘minimal volatility’, obtained 
by assuming that all the innovations are equal to zero. The next two terms represent the impact 
of the last observation, distinguishing the sign of this observation, on the current volatility. In the 


. . . n i 
3 In the strict sense, and for any reasonable specification, shocks are nonpersistent because 3o? 


ihl > 0 
a.s., but we wish to express the fact that, in some sense, the decay to 0 is slow. 
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Table 10.4 Likelihoods of the different models for the CAC 40 series. 


GARCH EGARCH QGARCH GJR-GARCH TGARCH 
log Ly, 7393 7404 7404 7406 7405 


estimated model, the impact of a positive value is 3.5 times less than that of a negative one. The 
last coefficient measures the importance of the last volatility. Even in absence of news, the decay 
of the volatility is slow because the coefficient 6; = 0.87 is rather close to 1. 


Likelihood Comparisons 


Table 10.4 gives the log-likelihood, log L,,, of the observations for the different models. One cannot 
directly compare the log-likelihood of the standard GARCH(1, 1) model, which has one parameter 
less, with that of the other models, but the log-likelihoods of the asymmetric models, which all 
have five parameters, can be compared. The largest likelihood is observed for the GJR threshold 
model, but, the difference being very slight, it is not clear that this model is really superior to 
the others. 


Resemblances between the Estimated Volatilities 


Figure 10.7 shows that the estimated volatilities for the five models are very similar. It follows 
that the different specifications produce very similar prediction intervals (see Figure 10.8). 


Distances between Estimated Models 


Differences can, however, be discerned between the various specifications. Table 10.5 gives an 
insight into the distances between the estimated volatilities for the different models. From this 
point of view, the TGARCH and EGARCH models are very close, and are also the most distant 
from the standard GARCH. The QGARCH model is the closest to the standard GARCH. Rather 
surprisingly, the TGARCH and GJR-GARCH models appear quite different. Indeed, the GJR- 
GARCH is a threshold model for the conditional variance and the TGARCH is a similar model 
for the conditional standard deviation. 
Figure 10.9 confirms the results of Table 10.5. The left-hand scatterplot shows 


2 2 2 2 _ 
(o TGaRCH — 9 GARCH? 1,EGARCH — 0; GaRcH) » t=l,...,n, 


and the right-hand one 


2 2 2 2 _ 
(o tTGARCH — ©; GARCH? 97,GIR-GARCH — of GaRcH) t=1,...,n. 


The left-hand graph shows that the difference between the estimated volatilities of the TGARCH 
and the standard GARCH, denoted by OP reancn — Or GARE: is always very close to the dif- 
ference between the estimated volatilities of the EGARCH and the standard GARCH, denoted 
by ag ARCH T afg arc (the difference from the standard GARCH is introduced to make the 
graphs more readable). The right-hand graph shows much more important differences between the 
TGARCH and GJR-GARCH specifications. 


Comparison between Implied and Sample Values of the Persistence 
and of the Leverage Effect 


We now wish to compare, for the different models, the theoretical autocorrelations p(|7|, |7+—nl) 
and p(|r;|,7:-n) to the empirical ones. The theoretical autocorrelations being difficult — if not 
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Figure 10.7 From left to right and top to bottom, graph of the first 500 values of the CAC 40 
index and estimated volatilities (x 10*) for the GARCH(1, 1), EGARCH(1, 1), QGARCH(1, 1), 
GJR-GARCH(1, 1) and TGARCH(1, 1) models. 


impossible — to obtain analytically, we used simulations of the estimated model to approximate 
these theoretical autocorrelations by their empirical counterparts. The length of the simulations, 
50000, seemed sufficient to obtain good accuracy (this was confirmed by comparing the empirical 
and theoretical values when the latter were available). 

Figure 10.10 shows satisfactory results for the standard GARCH model, as far as the auto- 
correlations of absolute values are concerned. Of course, this model is not able to reproduce the 
correlations induced by the leverage effect. Such autocorrelations are adequately reproduced by 
the TARCH model, as can be seen from the top and bottom right panels. The autocorrelations for 
the other asymmetric models are not reproduced here but are very similar to those of the TARCH. 
The negative correlations between r, and the 7;_, appear similar to the empirical ones. 


Implied and Empirical Kurtosis 


Table 10.6 shows that the theoretical variances obtained from the estimated models are close to 
the observed variance of the CAC 40 index. In contrast, the estimated kurtosis values are all much 
below the observed value. 
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Figure 10.8 Returns r, of the CAC 40 index (solid lines) and confidence intervals F + 30; (dotted 
lines), where 7 is the empirical mean of the returns over the whole period 1988-1998 and o; is 


the estimated volatility in the standard GARCH(1, 1) model (left) and in the EGARCH(1, 1) 
model (right). 


Table 10.5 Means of the squared differences between the estimated volatilities (x 10!°). 


GARCH EGARCH QGARCH GJR TGARCH 
GARCH 0 10.98 3.58 7.64 12.71 
EGARCH 10.98 0 3.64 6.47 1.05 
QGARCH 3.58 3.64 0 3.25 4.69 
GJR 7.64 6.47 3.25 0 9.03 
TGARCH 12.71 1.05 4.69 9.03 0 


Figure 10.9 Comparison of the estimated volatilities of the EGARCH and TARCH models (left), 
and of the TGARCH and GJR-GARCH models (right). The estimated volatilities are close when 
the scatterplot is elongated (see text). 


In all these five models, the conditional distribution of the returns is assumed to be (0, 1). This 
choice may be inadequate, which could explain the discrepancy between the estimated theoretical 
and the empirical kurtosis. Moreover, the normality assumption is clearly rejected by statistical 
tests, such as the Kolmogorov—Smirnov test, applied to the standardized returns. A leptokurtic 
distribution is observed for those standardized returns. 

Table 10.7 reveals a large number of returns outside the interval [F — 36,, 7 + 36;], whatever the 
specification used for ô+. If the conditional law were Gaussian and if the conditional variance were 
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Figure 10.10 Correlogram h +> p(|r:l|, |r:-n|) of the absolute values (left) and cross correlogram 
h |> p(|rt|, r-n) measuring the leverage effect (right), for the CAC 40 series (top), for the standard 
GARCH (middle), and for the TGARCH (bottom) estimated on the CAC 40 series. 


Table 10.6 Variance (x 10*) and kurtosis of the CAC 40 index and of simulations of length 
50000 of the five estimated models. 


CAC 40 GARCH EGARCH QGARCH GJR TGARCH 


Kurtosis 5.9 3.5 3.4 33 3.6 3.4 
Variance 1.3 1.3 13 1.3 1.3 1.3 


correctly specified, the probability of one return falling outside the interval would be 2{1 — ®(3)} = 
0.0027, which would correspond to an average of 6 values out of 2385. 
Asymmetric GARCH Models with Non-Gaussian Innovations 


To take into account the leptokurtic shape of the residuals distribution, we re-estimated the five 
GARCH models with a Student ¢ distribution — whose parameter is estimated — for nz. 
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Table 10.7 Number of CAC returns outside the limits 7 + 36, (THEO being the theoretical 
number when the conditional distribution is A(O, 67). 


THEO GARCH EGARCH QGARCH GJR TGARCH 
6 17 13 14 15 13 


Table 10.8 Means of the squares of the differences between the estimated volatilities (x 10!) 
for the models with Student innovations and the TGARCH model with Gaussian innovations 
(model (10.34) denoted TGARCHY ). 


GARCH EGARCH QGARCH GJR TGARCH TGARCH” 
GARCH 0 5.90 2.72 5.89 7.71 15.77 


EGARCH 5.90 0 2.27 5.08 0.89 8.92 
QGARCH 2.72 2.27 0 2.34 3.35 9.64 
GJR 5.89 5.08 2.34 0 7.21 11.46 
TGARCH 7.71 0.89 3.35 7.21 0 7.75 
TGARCH” 15.77 8.92 9.64 11.46 7.75 0 


For instance, the new estimated TGARCH model is 


rn =5x10* + én, €& =o,m, m ~t(9.7) 
(2x 10-4) 

o = 4.104 + 003 e}; — 010 e; + 0.90 o 
(1. 1074) (0.01) (0.02) (0.02). 


(10.37) 


It can be seen that the estimated volatility is quite different from that obtained with the normal 
distribution (see Table 10.8). 


Model with Interventions 


Analysis of the residuals show that the values observed at times tf = 228, 682 and 845 are scarcely 
compatible with the selected model. There are two ways to address this issue: one could either 
research a new specification that makes those values compatible with the model, or treat these 
three values as outliers for the selected model. 

In the first case, one could replace the M(0, 1) distribution of the noise n, with a more appro- 
priate (leptokurtic) one. The first difficulty with this is that no distribution is evident for these data 
(it is clear that distributions of Student ¢ or generalized error type would not provide good approx- 
imations of the distribution of the standardized residuals). The second difficulty is that changing 
the distribution might considerably enlarge the confidence intervals. Take the example of a 99% 
confidence interval at horizon 1. The initial interval [F — 2.576;, 7 + 2.576;] simply becomes the 
dilated interval [F — to.9956;, T + to,9956;] with fo.995 >> 2.57, provided that the estimates ô, are 
not much affected by the change of conditional distribution. Even if the new interval does contain 
99% of returns, there is a good chance that it will be excessively large for most of the data. 

So for this first case we should ideally change the prediction formula for o, so that the estimated 
volatility is larger for the three special data (the resulting smaller standardized residuals (r, — 
7)/,/n would become consistent with the V(0, 1) distribution), without much changing volatilities 
estimated for other data. Finding a reasonable model that achieves this change seems quite difficult. 

We have therefore opted for the second approach, treating these three values as outliers. Con- 
ceptually, this amounts to assuming that the model is not appropriate in certain circumstances. 
One can imagine that exceptional events occurred shortly before the three dates t = 228, 682 and 
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Table 10.9 SAS program for the fitting of a T@ARCH(1, 1)model with interventions. 


/* Data reading */ 
data cac; 
infile ‘c:\enseignement\PRedess\Garch\cac8898.dat’; 
input indice; date=_n_; 
run; 
/* Estimation of a TGARCH(1,1) model */ 
proc model data = cac ; 
/* Initial values are attributed to the parameters */ 
parameters cacmodl -0.075735 cacmod2 -0.064956 cacmod3 -0.0349778 omega .000779460 
alpha_plus 0.034732 alpha_moins 0.12200 beta 0.86887 intercept .000426280 ; 


/* The index is regressed on a constant and 3 interventions are made*/ 


if (_obs_ = 682 ) then indice= cacmod1; 
else if (_obs_ = 228 ) then indice= cacmod2; 
else if (_obs_ = 845 ) then indice= cacmod3; 


else indice = intercept ; 


/* The conditional variance is modeled by a TGARCH */ 


if (_obs_ = 1 ) then 

if ((alpha_plus+alpha_moins) /sqrt(2*constant(’pi’))+beta=1) then 

h.indice = (omega + (alpha_plus/2+alpha_moins/2+beta) *sqrt (mse.indice))**2 ; 

else h.indice = (omega/ (1-(alpha_plus+alpha_moins) /sqrt(2*constant (‘pi’) )-beta))**2; 
else 

if zlag(-resid.indice) > 0 then h.indice = (omega + alpha_plus*zlag(-resid.indice) 


+ beta*zlag(sqrt(h.indice) ))**2 
else h.indice = (omega - alpha_moins*zlag(-resid.indice) + beta*zlag(sqrt(h.indice) ))**2 
/* The model is fitted and the normalized residuals are stored in a SAS table*/ 
outvars nresid.indice; 
fit indice / method = marquardt fiml out=residtgarch ; 


run ; quit ; 


845. Other special events may occur in the future, and our model will be unable to anticipate the 
changes in volatility induced by these extraordinary events. The ideal would be to know the values 
that returns would have had if these exceptional event had not occurred, and to work with these 
corrected values. This is of course not possible, and we must also estimate the adjusted values. 
We will use an intervention model, assuming that only the returns of the three dates would have 
changed in the absence of the above-mentioned exceptional events. Other types of interventions 
can of course be envisaged. To estimate what would have been the returns of the three dates in the 
absence of exceptional events, we can add these three values to the parameters of the likelihood. 
This can easily be done using an SAS program (see Table 10.9). 


10.7 Bibliographical Notes 


The asymmetric reaction of the volatility to past positive and negative shocks has been well 
documented since the articles by Black (1976) and Christie (1982). These articles use the leverage 
effect to explain the fact that the volatility tends to overreact to price decreases, compared to price 
increases of the same magnitude. Other explanations, related to the existence of time-dependent 
risk premia, have been proposed; see, for instance, Campbell and Hentschel (1992), Bekaert and 
Wu (2000) and references therein. More recently, Avramov, Chordia and Goyal (2006) advanced 
an explanation founded on the volume of the daily exchanges. Empirical evidence of asymmetry 
has been given in numerous studies: see, for example, Engle and Ng (1993), Glosten and al. (1993), 
Nelson (1991), Wu (2001) and Zakoïan (1994). 
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The introduction of ‘news impact curves’ providing a visualization of the different forms of 
volatility is due to Pagan and Schwert (1990) and Engle and Ng (1993). The AVGARCH model 
was introduced by Taylor (1986) and Schwert (1989). The EGARCH model was introduced and 
studied by Nelson (1991). The GJR-GARCH model was introduced by Glosten, Jagannathan and 
Runkle (1993). The TGARCH model was introduced and studied by Zakoian (1994). This model is 
inspired by the threshold models of Tong (1978) and Tong and Lim (1980), which are used for the 
conditional mean. See Gongalves and Mendes Lopes (1994, 1996) for the stationarity study of the 
TGARCH model. An extension was proposed by Rabemananjara and Zakoian (1993) in which the 
volatility coefficients are not constrained to be positive. The TGARCH model was also extended to 
the case of a nonzero threshold by Hentschel (1995), and to the case of multiple thresholds by Liu 
and al. (1997). Model (10.21), with ô = 2 and çı = --- = çq, was studied by Straumann (2005). 
This model is called ‘asymmetric GARCH’ (AGARCH(p, q)) by Straumann, but the acronym 
AGARCH, which has been employed for several other models, is ambiguous. A variant is the 
double-threshold ARCH (DTARCH) model of Li and Li (1996), in which the thresholds appear 
both in the conditional mean and the conditional variance. Specifications making the transition 
variable continuous were proposed by Hagerud (1997), Gonzalez-Rivera (1998) and Taylor (2004). 
Various classes of models rely on Box—Cox transformations of the volatility: the APARCH model 
was proposed by Higgins and Bera (1992) in its symmetric form (NGARCH model) and then 
generalized by Ding, Granger and Engle (1993); another generalization is that of Hwang and Kim 
(2004). The qualitative threshold ARCH model was proposed by Gouriéroux and Monfort (1992), 
the quadratic ARCH model by Sentana (1995). The conditional density was modeled by Hansen 
(1994). The contemporaneous asymmetric GARCH model of Section 10.5 was proposed by El 
Babsiri and Zakotan (2001). In this article, the strict and second-order stationarity conditions were 
established and the statistical inference was studied. Recent comparisons of asymmetric GARCH 
models were proposed by Awartani and Corradi (2006), Chen, Gerlach and So (2006) and Hansen 
and Lunde (2005). 


10.8 Exercises 


10.1 (Noncorrelation between the volatility and past values when the law of ņ, is symmetric) 
Prove the symmetry property (10.1). 


10.2 (The expectation of a product of independent variables is not always the product of the 
expectations) 
Find a sequence X; of independent real random variables such that Y = []?2, X; exists 
almost surely, EY and EX; exist for all i, and Thy EX; exists, but such that EY +Æ 
Mic EXi. 


10.3 (Convergence of an infinite product entails convergence of the infinite sum of logarithms) 
Prove that, under the assumptions of Theorem 10.1, condition (10.6) entails the absolute 
convergence of the series of general term log g} (å;). 


10.4 (Variance of an EGARCH) 
Complete the proof of Theorem 10.1 by showing in detail that (10.7) entails the desired 
result on Ee?. 


10.5 (A Gaussian EGARCH admits a variance) 
Show that, for an EGARCH with Gaussian innovations, condition (10.8) for the existence 
of a second-order moment is satisfied. 


10.6 


10.7 


10.8 


10.9 


10.10 


10.11 
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(ARMA representation for the logarithm of the square of an EGARCH) 

Compute the ARMA representation of log é when e; is an EGARCH(1, 1) process with 
n: Gaussian. Provide an explicit expression by giving numerical values for the EGARCH 
coefficients. 


(B-mixing of an EGARCH) 
Using Exercise 3.5, give simple conditions for an EGARCH(1, 1) process to be geometri- 
cally 6-mixing. 


(Stationarity of a TGARCH) 
Establish the second-order stationarity condition (10.14) of a TGARCH(1, 1) process. 


(Autocorrelation of the absolute value of a TGARCH) 

Compute the autocorrelation function of the absolute value of a TGARCH(1, 1) process 
when the noise 7; is Gaussian. Would this computation be feasible for a standard GARCH 
process? 


(A TGARCH is an APARCH) 
Check that the results obtained for the APARCH(1, 1) model can be used to retrieve those 
obtained for the TGARCH(1, 1) model. 


(Study of a thresold model) 
Consider the model 


E&i =n, logo, =a+aym—1 Uy,_,>0) =- n- Uy, <0} - 


To which class does this model belong? Which constraints is it natural to impose on the coef- 
ficients? What are the strict and second-order stationarity conditions? Compute Cov(o;, €;—1) 
in the case where n, ~ M(0, 1), and verify that the model can capture the leverage effect. 


11 


Multivariate GARCH Processes 


While the volatility of univariate series has been the focus of the previous chapters, modeling the 
comovements of several series is of great practical importance. When several series displaying 
temporal or contemporaneous dependencies are available, it is useful to analyze them jointly, by 
viewing them as the components of a vector-valued (multivariate) process. The standard linear 
modeling of real time series has a natural multivariate extension through the framework of the 
vector ARMA (VARMA) models. In particular, the subclass of vector autoregressive (VAR) models 
has been widely studied in the econometric literature. This extension entails numerous specific 
problems and has given rise to new research areas (such as cointegration). 

Similarly, it is important to introduce the concept of multivariate GARCH model. For instance, 
asset pricing and risk management crucially depend on the conditional covariance structure of the 
assets of a portfolio. Unlike the ARMA models, however, the GARCH model specification does not 
suggest a natural extension to the multivariate framework. Indeed, the (conditional) expectation 
of a vector of size m is a vector of size m, but the (conditional) variance is an m x m matrix. 
A general extension of the univariate GARCH processes would involve specifying each of the 
m(m + 1)/2 entries of this matrix as a function of its past values and the past values of the 
other entries. Given the excessive number of parameters that this approach would entail, it is not 
feasible from a statistical point of view. An alternative approach is to introduce some specification 
constraints which, while preserving a certain generality, make these models operational. 

We start by reviewing the main concepts for the analysis of the multivariate time series. 


11.1 Multivariate Stationary Processes 


In this section, we consider a vector process (X;);ez of dimension m, X; = (X1,;,..., Xm). The 
definition of strict stationarity (see Chapter 1, Definition 1.1) remains valid for vector processes, 
while second-order stationarity is defined as follows. 


Definition 11.1 (Second-order stationarity) The process (X,+) is said to be second-order station- 
ary if: 


(i) EX? < œ, Yt € Z, i=1,...,m; 
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(ii) EX, = m, Yt € Z; 
(iii) Cov(X;, Xt+n) = E[(X: — bw) (Xian — Y1 =T (h), Yt, h € Z. 


The function T (-), taking values in the space of m x m matrices, is called the autocovariance 
function of (X;). 


Obviously Cy (h) = x (—h)’. In particular, Py (0) = Var(X,) is a symmetric matrix. 

The simplest example of a multivariate stationary process is white noise, defined as a sequence 
of centered and uncorrelated variables whose covariance matrix is time-independent. 

The following property can be used to construct stationary processes by linear transformation 
of another stationary process. 


Theorem 11.1 (Stationary linear filter) Let (Z;) denote a stationary process, Z; € R". Let 
(Ck)kez denote a sequence of nonrandom n x m matrices, such that, for all i =1,...,n, for 
all j=1,...,m, Vez lew < œ, where Ck = fer). Then the R"-valued process defined by 
X = rez CyZ,—% is stationary and we have, in obvious notation, 


bx =D) Cuz, Tx = DY Cr z(h +k- DC. 
keZ k, lEZ 


The proof of an analogous result is given by Brockwell and Davis (1991, pp. 83-84) and the 
arguments used extend straightforwardly to the multivariate setting. When, in this theorem, (Z+) is 
a white noise and Cp = 0 for all k < 0, (X+) is called a vector moving average process of infinite 
order, VMA(oo). A multivariate extension of Wold’s representation theorem (see Hannan, 1970, 
pp. 157-158) states that if (X;) is a stationary and purely nondeterministic process, it can be 
represented as an infinite-order moving average, 


ioe) 
Xp = J | Chek = C(B)er, Co = In, (11.1) 
k=0 


where (€,) is an (m x 1) white noise, B is the lag operator, C (B) = ar C, B*, and the matrices 
Cx are not necessarily absolutely summable but satisfy the (weaker) condition yes Cell? < 0, 
for any matrix norm ||- ||. The following definition generalizes the notion of a scalar ARMA 
process to the multivariate case. 


Definition 11.2 (VARMA(p, q) process) An R”-valued process (X;)ez is called a vector 
ARMA process of orders p and q (VARMA(p,q)) if (Xt)rez is a stationary solution to the 
difference equation 

P(B)X;, = c + Y(B)e, (11.2) 


where (€;) is an (m x 1) white noise with covariance matrix Q, c is an m x 1 vector, and ®(z) = 
Im — ız — +++ — ®pz? and Y (z) = Im — Yiz — --- — Yaz’ are matrix-valued polynomials. 


Denote by det(A), or more simply |A| when there is no ambiguity, the determinant of a square 
matrix A. A sufficient condition for the existence of a stationary and invertible solution to the 
preceding equation is 


|®(z)||W(z)| #0, forallz € C such that |z| < 1 


(see Brockwell and Davis, 1991, Theorems 11.3.1 and 11.3.2). 
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When p = 0, the process is called vector moving average of order q (VMA(q)); when q = 0, 
the process is called vector autoregressive of order p (VAR(p)). 

Note that the determinant |®(z)| is a polynomial admitting a finite number of roots z1,..., Zmp- 
Let 6 = min; |z;| > 1. The power series expansion 


POT = POTE) = DI Caz", (11.3) 
k=0 


where A* denotes the adjoint of the matrix A (that is, the transpose of the matrix of the cofactors 
of A), is well defined for |z| < 5, and is such that @(z)~!®(z) = J. The matrices Cx are recursively 
obtained by 


min(k, p) 
Co=I, andfork>1, C= x CeDo. (11.4) 
i=l 


11.2 Multivariate GARCH Models 


As in the univariate case, we can define multivariate GARCH models by specifying their first two 
conditional moments. An R”-valued GARCH process (€r), with €r = (€1;,..-, Emt)’, must then 
satisfy, for all t € Z, 


E(€, | €u, U < t)=0, Var(€, | €u, U < t) = E (61€, | €u, u < t) = Ay. (11.5) 


The multivariate extension of the notion of the strong GARCH process is based on an equation 
of the form 


& = H m, (11.6) 


where (n+) is a sequence of iid IR’”-valued variables with zero mean and identity covariance matrix. 
The matrix H ? can be chosen to be symmetric and positive definite! but it can also be chosen to 
be triangular, with positive diagonal elements (see, for instance, Harville, 1997, Theorem 14.5.11). 
The latter choice may be of interest because if, for instance, H" 2 is chosen to be lower triangular, 
the first component of €, only depends on the first component of n,. When m = 2, we can thus set 


1/2 
ay mis 


Eir 


1/2 
hiz: hii tha21 ha; / (11.7) 
Ex = we Nir + An Mr, 


it 


where ni; and h;;,; denote the generic elements of n, and H;. 

Note that any square integrable solution (€,;) of (11.6) is a martingale difference satis- 
fying (11.5). 

Choosing a specification for H; is obviously more delicate than in the univariate framework 
because: (i) H, should be (almost surely) symmetric, and positive definite for all rt; (ii) the spec- 
ification should be simple enough to be amenable to probabilistic study (existence of solutions, 
stationarity, ...), while being of sufficient generality; (iii) the specification should be parsimonious 
enough to enable feasible estimation. However, the model should not be too simple to be able to 
capture the — possibly sophisticated — dynamics in the covariance structure. 


' The choice is then unique because to any positive definite matrix A, one can associate a unique positive 
definite matrix R such that A = R? (see Harville, 1997, Theorem 21.9.1). We have R = PA! P’, where A!/? 
is a diagonal matrix, with diagonal elements the square roots of the eigenvalues of A, and P is the orthogonal 
matrix of the corresponding eigenvectors. 
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Moreover, it may be useful to have the so-called stability by aggregation property. If €; satisfies 
(11.5), the process (€;) defined by €, = Pe,, where P is an invertible square matrix, is such that 


Eé,|&,u<t)=0, Var(é|&,u<t)=A,= PHP’. (11.8) 


The stability by aggregation of a class of specifications for H, requires that the conditional variance 
matrices H; belong to the same class for any choice of P. This property is particularly relevant in 
finance because if the components of the vector €, are asset returns, & is a vector of portfolios of 
the same assets, each of its components consisting of amounts (coefficients of the corresponding 
row of P) of the initial assets. 


11.2.1 Diagonal Model 


A popular specification, known as the diagonal representation, is obtained by assuming that each 
element hye, of the covariance matrix H, is formulated in terms only of the product of the prior 
k and £ returns. Specifically, 


q p 
(i) G) 
hke, = wke + ` Age Ek,t—i €l, t-i + > byy hke t-j, 
i=l j=l 


with oe = ox, a) =a}, bË? = b\? for all (k, £). For m = 1 this model coincides with the 
usual univariate formulation. When m > 1 the model obviously has a large number of parameters 


and will not in general produce positive definite covariance matrices H;. We have 


(i) -2 (i) 


i 
M11 ase wim q ayy €l =i tee AimEl,t—iEm,t—i 
H=]: mi | 
i=l] (i) @) -2 
Olm -Omm AimEl,t—i€m,t—i + + Amm Em ti 
Q) G) 
È by Aisi --.  bimħlm,t-i 
FE : : 
j=1 G) : G) 12 
Din Mimst—i nee, Diam in ti 


q P 
= Q+ J diag(e-:)A® diag(e:) + X` BY © Hij 
i=l j=l 


where © denotes the Hadamard product, that is, the element by element product.? Thus, in the 
ARCH case (p = 0), sufficient positivity conditions are that Q is positive definite and the A” are 
positive semi-definite, but these constraints do not easily generalize to the GARCH case. We shall 
give further positivity conditions obtained by expressing the model in a different way, viewing it 
as a particular case of a more general class. 

It is easy to see that the model is not stable by aggregation: for instance, the conditional 
variance of €;,; + €2, can in general be expressed as a function of the Eii and E but not 
of the (€),;—; + €2,11- ;)*. A final drawback of this model is that there is no interaction between the 
different components of the conditional covariance, which appears unrealistic for applications to 
financial series. 

In what follows we present the main specifications introduced in the literature, before turning 
to the existence of solutions. Let n denote a probability distribution on R”, with zero mean and 
unit covariance matrix. 


2 For two matrices A = (aij) and B = (b;;) of the same dimension, A © B = (a;jbij). 
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11.2.2 Vector GARCH Model 


The vector GARCH (VEC-GARCH) model is the most direct generalization of univariate GARCH: 
every conditional covariance is a function of lagged conditional variances as well as lagged cross- 
products of all components. In some sense, everything is explained by everything, which makes 
this model very general but also not very parsimonious. 

Denote by vech(-) the operator that stacks the columns of the lower triangular part of its 
argument square matrix (if A = (aij), then vech(A) = (a11, G21, -.-, Am1, 22, «++ Am2, - - - , Amm’). 
The next definition is a natural extension of the standard GARCH(p, q) specification. 


Definition 11.3 (VEC-GARCH(p, q) process) Let (n,;) be a sequence of iid variables with dis- 
tribution n. The process (€,) is said to admit a VEC-GARCH(p, q) representation (relative to the 
sequence (n;)) if it satisfies 


H m, where H, is positive definite such that 


Er 


q p 
, 11.9 
vech(H;) = o+ X AP vechleiej;) +} BO vech(H,—;) p] 


i=l j=l 


where w is a vector of size {m(m + 1)/2} x 1, and the A® and B®? are matrices of dimension 
m(m + 1)/2 x m(m + 1)/2. 


Remark 11.1 (The diagonal model is a special case of the VEC-GARCH model) The diag- 
onal model admits a vector representation, obtained for diagonal matrices A“ and BY). 

We will show that the class of VEC-GARCH models is stable by aggregation. Recall that the 
vec(-) operator converts any matrix to a vector by stacking all the columns of the matrix into one 
vector. It is related to the vech operator by the formulas 


vec(A) = D,»,vechA, vech(A) = Dt vecA, (11.10) 


m 


where A is any m x m symmetric matrix, Dm is a full-rank m? x m(m + 1) /2 matrix (the so- 
called ‘duplication matrix’), whose entries are only 0 and 1, D} = (D!,Dm)~'D/,.> We also have 
the relation 


vec(ABC) = (C’ ® A)vec(B), (11.11) 
where ® denotes the Kronecker matrix product,’ provided the product ABC is well defined. 


Theorem 11.2 (The VEC-GARCH is stable by aggregation) Let (€;) be a VEC-GARCH(p, q) 
process. Then, for any invertible mxm matrix P, the process č& = Pe, is a VEC- 
GARCH(p, q) process. 


3 For instance, 


Dı = (1), D2 = 


or = 
re O Oel 


1 0 0 0 
D s=[0 1/2 1/2 0j: 
0 


More generally, for i > j, the [(j — 1)m + i]th and [(@ — 1)m + j]th rows of Dm equal the m(m + 1)/2- 
dimensional row vector all of whose entries are null, with the exception of the [(j — 1)(m — j/2) + ilth, 
equal to 1. 

4 If A = (aij) is an m x n matrix and B is an m’ x n’ matrix, A Q B is the mm’ x nn’ matrix admitting 
the block elements aj; B. 


ooor 
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Proof. Setting H, = PH; P’, we have €; = AP n and 
vech(H;) = D} (P & P)Dmvech(H,;) 
q P 
De vech(E,_ iči) + X BY vech( h,- ji 


j=l 


where 
=D (P Q P)Dno, 


m 


A® = D} (P @ P)Dm ADE (P7! @ P7!)Din, 


m m 


BO =Dt(P Q P)Dm BODE (P=! @ P“')Dm.- 


To derive the form of A“ we use 
vec(€,€,) = vec(P~ tee p=) =(P'@ P~')Dnvech(é,é/), 
and for B® we use 


vec(H,) = vec(P~'H, P7!) = (P7! @ P~')D,, vech(H;,). 


Positivity Conditions 


We now seek conditions ensuring the positivity of H;. A generic element of 
h; = vech(H,) 


is denoted by hye; (k > £) and we will denote by ay wa GA yy) the entry of A (BO) located 
on the same row as hke s and belonging to the same column as the element hye, of hi. We thus 
have an expression of the form 


hres = COV (Ekt, Eer | Eu, U < t) 
m P m 


= wke + 5 J a) pek t-iEU ti + ` > ay perk el t- j: 


i=1 K ¢/=1 j=l kh t= 
k'ay k> 


Denoting by AD the m x m symmetric matrix with (k’, £’)th entries a‘ /2, for k' # €’, and the 


i ke! 
elements ay. wy on the diagonal, the preceding equality is written as 


p m 


hres = oke + 3 [i Ageit yO b pehee sj (11.12) 
i=l Jal k=l 
ki>et! 
In order to obtain a more compact form for the last part of this expression, let us introduce the 
spectral decomposition of the symmetric matrices H,, assumed to be positive semi-definite. We 
have H, = oy ire yO" where V, = (v,.. . vi”) is an orthogonal matrix of eigenvectors 
v” associated vith the (positive) eigenvalues Na) of H,. Defining the matrices BY , by analogy 


with the A we get 


m 


/ 
hea = tne + DA. Ayer yo oO Bu k (11.13) 
j=lr=1 
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Finally, consider the m? x m? matrix admitting the block form A; = (A), and let B; = (B 
The preceding expressions are equivalent to 


(i) 
ke )- 


m 
-\/ 
H, = LAPP? 


r=l 


q 
=Q+ pes ® €i) Ai (Im 8 Eti) 
i=l 
P m f aa 
+Y A Un vP, )B; n vP), (11.14) 
j=lr=1 


where Q is the symmetric matrix such that vech(Q2) = œw. 
In this form, it is evident that the assumption 


A; and B; are positive semi-definite and & is positive definite (11.15) 


ensures that if the H;—; are almost surely positive definite, then so is H;. 


Example 11.1 (Three representations of a vector ARCH(1) model) For p=0, g =1 and 
m = 2, the conditional variance is written, in the form (11.9), as 


2 
Ais wi Git 411,12 411,22 Eii 
vech(H;) = | hizi | =| @i2 |+| ani an2 anz €1,7-1€2,r-1 |> 
2 
haz, 22 472,11 422,12 422,22 Gai 


in the form (11.12) as 


akg,12 
akg,11 7 Gis 
he =ou tennen] l l Lim |: k, f= 1,2, 


and in the form (11.14) as 


411,12 412,12 
411,11 2 412,11 7 
411,12 412,12 
411,22 = 412,22 
(00) w12 2 ’ 2 
H=| © "P14 RBE) (h ® &-1). 
@12 22 tl 412,12 92,12 = 
412,11 7 anı < 
412,12 422,12 
7 412,22 “7 422,22 


This example shows that, even for small orders, the VEC model potentially has an enormous 
number of parameters, which can make estimation of the parameters computationally demanding. 
Moreover, the positivity conditions are not directly obtained from (11.9) but from (11.14), involving 
the spectral decomposition of the matrices H,- j. 


The following classes provide more parsimonious and tractable models. 


11.2.3 Constant Conditional Correlations Models 


Suppose that, for a multivariate GARCH process of the form (11.6), all the past information on 
€kt, involving all the variables €¢—;, is summarized in the variable hkk t, with Ehkk t = Re. 
Then, letting ñk = het Ges we define for all k a sequence of iid variables with zero mean 


280 GARCH MODELS 


and unit variance. The variables 7,,; are generally correlated, so let R = Var(7,) = (pke), where 
fit = (its . --, Amt)’. The conditional variance of 


r 1/2 1/2 \s 
= diag(hy/?,, pel Ot 
is then written as 
: 2 2 : 2 2 
H, = diag(h i4’, esey hya) R diag(h i} ,. SER hami i a 1.16) 


By construction, the conditional correlations between the components of €, are time-invariant: 


hke E(€xr€er | €u, U < t) ö 
—n el er T kl. 
hahi, E€ | €w u < DEE, | €u u < tH}? 


To complete the specification, the dynamics of the conditional variances hgg, has to be defined. 
The simplest constant conditional correlations (CCC) model relies on the following univariate 
GARCH specifications: 


hike = k+ akiki t ae bk jħkk -j k=1,...,m, (11.17) 


where œp > 0, ak i = 0, be, =O, —1 < pre < 1, Pre = 1, and R is symmetric and positive semi- 
definite. Observe that the conditional variances are specified as in the diagonal model. The 
conditional covariances clearly are not linear in the squares and cross products of the returns. 

In a multivariate framework, it seems natural to extend the specification (11.17) by allowing 
hxx,r to depend not only on its own past, but also on the past of all the variables €¢ +. Set 


Jit, 0 «. 0 ; 


hirs "i Eir 
h,= D, = E : 
hmm,t . ` e2 


V hmm,t 


Definition 11.4 (CCC-GARCH(p, q) process) Let (n+) be a sequence of iid variables with dis- 
tribution n. A process (€,) is called CCC-GARCH(p, q) if it satisfies 
Ep = Hon, 


H, = D,;RD; 


q p 
a+ Yo Ale; + > Byhy_;, 
i=1 jel 


where R is a correlation matrix, œ is am x 1 vector with positive coefficients, and the A; and Bj 
are m X m matrices with nonnegative coefficients. 


(11.18) 


= 
II 


We have e, = D;f;, where ñ+ = R" on is a centered vector with covariance matrix R. The 
. 1/2 ~ i, : 
components of €; thus have the usual expression, €t = hgk tikr, but the conditional variance hgk, 
depends on the past of all the components of €;. 
Note that the conditional covariances are generally nonlinear functions of the components of 
€1-i€)_; and of past values of the components of H;. Model (11.18) is thus not a VEC-GARCH 


model, defined by (11.9), except when R is the identity matrix. 
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One advantage of this specification is that a simple condition ensuring the positive definiteness 
of H; is obtained through the positive coefficients for the matrices A; and B; and the choice of 
a positive definite matrix for R. We shall also see that the study of the stationarity is remarkably 
simple. 

Two limitations of the CCC model are, however, (i) its nonstability by aggregation and (ii) the 
arbitrary nature of the assumption of constant conditional correlations. 


11.2.4 Dynamic Conditional Correlations Models 


Dynamic conditional correlations GARCH (DCC-GARCH) models are an extension of CCC- 
GARCH, obtained by introducing a dynamic for the conditional correlation. Hence, the constant 
matrix R in Definition 11.4 is replaced by a matrix R; which is measurable with respect to the past 
variables {€„, u < t}. For reasons of parsimony, it seems reasonable to choose diagonal matrices A; 
and B; in (11.18), corresponding to univariate GARCH models for each component as in (11.17). 
Different DCC models are obtained depending on the specification of R,. A simple formulation is 


R; = OR + 02-1 + 63 R1, (11.19) 


where the 6; are positive weights summing to 1, R is a constant correlation matrix, and W;_, is 
the empirical correlation matrix of €;—1, ..., €m. The matrix R, is thus a correlation matrix (see 
Exercise 11.9). Equation (11.19) is reminiscent of the GARCH(1, 1) specification, 6; R playing the 
role of the parameter w, 62 that of a, and 03 that of £. 

Another way of specifying the dynamics of R, is by setting 


R, = (diag Q)? Q; (diag Q)", 


where diag Q, is the diagonal matrix constructed with the diagonal elements of Q;, and Q, is 
a sequence of covariance matrices which is measurable with respect to o (€u, u < t). A natural 
parameterization is 

Q, = 010 + €-1€;_1 + 301-1, (11.20) 


where Q is a covariance matrix. Again, the formulation recalls the GARCH(1, 1) model. Though 
different, both specifications (11.19) and (11.20) allow us to test the assumption of constant condi- 
tional covariance matrix, by considering the restriction 02 = 03 = 0. Note that the same 02 and 63 
coefficients appear in the different conditional correlations, which thus have very similar dynamics. 
The matrices R and Q are often estimated/replaced by the empirical correlation and covariance 
matrices. In this approach a DCC model of the form (11.19) or (11.20) thus introduces only two 
more parameters than the CCC formulation. 


11.2.5 BEKK-GARCH Model 


The BEKK acronym refers to a specific parameterization of the multivariate GARCH model devel- 
oped by Baba, Engle, Kraft and Kroner, in a preliminary version of Engle and Kroner (1995). 


Definition 11.5 (BEKK-GARCH(p, q) process) Let (n;) denote an iid sequence with common 
distribution n. The process (€;) is called a strong GARCH(p, q), with respect to the sequence (n+), 
if it satisfies 


1/2 
Er = H,! Nt 
4 K p K 
He = BEDS Aneel het I a 
i=l k=1 j=l k=l 


where K is an integer, Q, Aix and Bj, are square m x m matrices, and Q is positive definite. 
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The specification obviously ensures that if the matrices H,_;, i = 1,..., p, are almost surely 
positive definite, then so is H;. 

To compare this model with the representation (11.9), let us derive the vector form of the 
equation for H,. Using the relations (11.10) and (11.11), we get 


q K 


vech(H;) = vech(Q) + ` Dh X (Aik 8 Aix)Dmvech(eiej_;) 
i=l k=l 


P 
tà m Den ® B jx) DmvechH,;— j= 


The model can thus be written in the form (11.9), with 


K 
A® = D} Di Y Aa B Aik)Dm, BY? = D} X (Bjr ® Bjx)Dn, (11.21) 
k=1 k=1 
for i = 1,...,q and j = 1,..., p. In particular, it can be seen that the number of coefficients of 


a matrix A® in (11.9) is [m(m + 1)/2]°, whereas it is Km? in this particular case. 
The BEKK class contains (Exercise 11.13) the diagonal models obtained by choosing diagonal 
matrices A;, and B;,. The following theorem establishes a converse to this property. 
Theorem 11.3 For the model defined by the diagonal vector representation (11.9) with 
AW = diag {vech (A®)} , BY = diag {vech (B)} s 


where AD = ah) and BD) = Gy ; ) are m x m symmetric positive semi-definite matrices, there 
exist matrices Aip and Bj, such that (11.21) holds, for K =m. 


Proof. There exists an upper triangular matrix 


(i) (i) (i) 
dii m diy m-i di 
L t 
DO _ 0 diz m-1 diz 1 
(i ) 
0 0 m 4, 1 
such that A® = DO(DOY. Let Aix = diag(d;) ,. dp, -s dË, 0, ...,0) where r =m —k + 1, 
for k = 1,...,m. It is easy to show that the first equality in (11.21) is satisfied with K = m. The 


second equality is obtained similarly. 


Example 11.2 By way of illustration, consider the particular case where m = 2, q = K = 1 and 
p = 0. If A = (aij) is a 2 x 2 matrix, it is easy to see that 


a?i 2411412 ae, 
+ 
D; (A @ A)D2 = | anan ananz +anran ananz 
2 2 
a5 2a21422 az 


Hence, canceling out the unnecessary indices, 


Ded 


222 
Any, = @1+ aye, T 241 1412€1,1—-1€2,1-1 + A19©9 t1 
2 2 
hing = œn +ananei ;_; + (aaz + 12421 )€1,1-1€2,1-1 + 41242263 ;_1 
22 22 
ho, = 22 +45, €7 t1 + 2a21022€1,1-1€2,r-1 + 499€5 t1- 
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In particular, the diagonal models belonging to this class are of the form 


22 

his = On +ayyer sy 

hi2t = © + 411422€ 1 4-1€2,4-1 
_ 2 2 

ho. = wn + A5x€7 i 


Remark 11.2 (Interpretation of the BEKK coefficients) Example 11.2 shows that the BEKK 
specification imposes highly artificial constraints on the volatilities and covolatilities of the com- 
ponents. As a consequence, the coefficients of a BEKK representation are difficult to interpret. 


Remark 11.3 (Identifiability) Identifiability of a BEKK representation requires additional con- 
straints. Indeed, the same representation holds if A; is replaced by —Aj,x, or if the matrices 
Aik, ..., Agk and Ayy,..., Agw are permuted for k Æ k’. 


Example 11.3 (A general and identifiable BEKK representation) Consider the case m = 2, 
q=1 and p=0. Suppose that the distribution 7 is nondegenerate, so that there exists no 
nontrivial constant linear combination of a finite number of the €p ;—;€¢,—;. Let 


m2 


H; =Q + Yo Arere iA 
k=1 


where Q is a symetric positive definite matrix, 
a a 0 0 
a(n St). we . 
a21,1 422,1 a21,2 422,2 
0 a o 0 
A3 = 123 |, A4= , 
0 an3 0 an4 
with a11,1 > 0, a12,3 = 0, a21,2 = 0 and an4 > 0. 


Let us show that this BEKK representation is both identifiable and quite general. Easy, but 
tedious, computation shows that an expression of the form (11.9) holds with 


4 
A = Ý D} (Ag 8 Ax) D2 
k=1 
2 2 2 
aiig 2a11,1412,1 ahı + 42,3 
= 411,1421,1 412,1421,1 T 411,1422,1 412,1422,1 T 412,3422,3 
2 2 2 2 2 2 
45) +45) 2421,1422,1 + 2a1,2422,2 A31 + A599 + A935 + A974 


In view of the sign constraint, the (1, 1)th element of A") allows us to identify aj;,;. The (1, 2)th 
and (2, 1)th elements then allow us to find a12,1 and a2;,;, whence the (2, 2)th element yields az 1. 
The two elements of A3 are deduced from the (1, 3)th and (2, 3)th elements of A, and from the 
constraint a12,3 > 0 (which could be replaced by a constraint on the sign of a22,3). Az is identified 
similarly, and the nonzero element of A4 is finally identified by considering the (3, 3)th element 
of AW), 

In this example, the BEKK representation contains the same number of parameters as the 
corresponding VEC representation, but has the advantage of automatically providing a positive 
definite solution H,. 


It is interesting to consider the stability by aggregation of the BEKK class. 
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Theorem 11.4 (Stability of the BEKK model by aggregation) Let (€;) be a BEKK-GARCH 


(p,q) process. Then, for any invertible m x m matrix P, the process € = Pe, is a BEKK- 
GARCH(p, q) process. 


Proof. Letting H, = PH, P’, 2 = PQP’, Ay, = PA, P'!, and Bj, = PBjxP—', we get 
q K K 
< _ wl > z5 g 
& = A m, i> DAs čit DOD) Bix th-iBiu 


and, Q being a positive definite matrix, the result is proved. 


As in the univariate case, the ‘square’ of the (€,) process is the solution of an ARMA model. 
Indeed, define the innovation of the process vec(€;€/): 


v; = vec(erej) — vec[E (ee, |€u, u < t)] = vec(ere;) — vec(H;). (11.22) 


Applying the vec operator, and substituting the variables vec(H;—;) in the model of Definition 
11.5 by vec(€;—j€;_;) — V;—j;, we get the representation 
r K 
vec(€,€) = vec(Q) + $. J (Air + Bix) ® (Aik + Bix)} vec(e—i€}_;) 
i=1 k=1 
K 


p 
HALI (Bye ® Burj, tEZ, (11.23) 


1 k=1 


where r = max(p, q), with the convention Aj, = 0 (Bj, = 0) if i >q (j > p). This representation 
cannot be used to obtain stationarity conditions because the process (v+) is not iid in general. 
However, it can be used to derive the second-order moment, when it exists, of the process €; as 
r K 
E{vec(ere)} = vec(2) + X` XO {(Aix + Bik) Q (Aik + Bix)} E{vectes€})}, 
i=1 k=1 


that is, ; 


r K 
E{vec(e;€;)} = |; — SO (Art Bik) (Aik + Bir)}  vec(9), 


i=l k=l 


provided that the matrix in braces is nonsingular. 


11.2.6 Factor GARCH Models 


In these models, it is assumed that a nonsingular linear combination f; of the m components 
of €, or an exogenous variable summarizing the comovements of the components, has a 
GARCH structure. 


Factor models with idiosyncratic noise 


A very popular factor model links individual returns €;, to the market return f, through a regres- 
sion model 


Eit = Bift tn, i=1,...,m. (11.24) 


The parameter 6; can be interpreted as a sensitivity to the factor, and the noise 7;; as a specific 
risk (often called idiosyncratic risk) which is conditionally uncorrelated with f,. It follows that 
H, = Q+A;BB’ where B is the vector of sensitivities, A; is the conditional variance of f, and Q 
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is the covariance matrix of the idiosyncratic terms. More generally, assuming the existence of r 
conditionally uncorrelated factors, we obtain the decomposition 


H, = 2+ D> Aj:B 5. (11.25) 
j=l 


It is not restrictive to assume that the factors are linear combinations of the components of €; 
(Exercise 11.10). If, in addition, the conditional variances Aj; are specified as univariate GARCH, 
the model remains parsimonious in terms of unknown parameters and (11.25) reduces to a par- 
ticular BEKK model (Exercise 11.11). If & is chosen to be positive definite and if the univariate 
series (A j;);, j = 1,...,7r, are independent, strictly and second-order stationary, then it is clear 
that (11.25) defines a sequence of positive definite matrices (H,) that are strictly and second- 
order stationary. 


Principal components GARCH model 


The concept of factor is central to principal components analysis (PCA) and to other methods of 
exploratory data analysis. PCA relies on decomposing the covariance matrix V of m quantitative 
variables as V = PAP’, where A is a diagonal matrix whose elements are the eigenvalues A; > 
À2 > --- > Am of V, and where P is the orthonormal matrix of the corresponding eigenvectors. 
The first principal component is the linear combination of the m variables, with weights given by 
the first column of P, which, in some sense, is the factor which best summarizes the set of m 
variables (Exercise 11.12). There exist m principal components, which are uncorrelated and whose 
variances 41,..., Am (and hence whose explanatory powers) are in decreasing order. It is natural 
considering this method for extracting the key factors of the volatilities of the m components 
of €. 
We obtain a principal component GARCH (PC-GARCH) or orthogonal GARCH (O-GARCH) 
model by assuming that 
H; = PA, P’, (11.26) 


where P is an orthogonal matrix (P’ = P~') and A, = diag (Aj;,..., Amr), Where the Aj, are the 
volatilities, which can be obtained from univariate GARCH-type models. This is equivalent to 
assuming 

€ = Pf, (11.27) 


where f, = P'e, is the principal component vector, whose components are orthogonal factors. If 
univariate GARCH(1, 1) models are used for the factors fj; = viet P(j, i)é jr, then 


dir = @; + Qi f_i + Bidi. (11.28) 


Remark 11.4 (Interpretation, factor estimation and extensions) 


1. Model (11.26) can also be interpreted as a full-factor GARCH (FF-GARCH) model, that 
is, a model with as many factors as components and no idiosyncratic term. Let P(-, j) be 
the jth column of P (an eigenvector of H, associated with the eigenvalue 1 j;). We get a 
spectral expression for the conditional variance, 


m 
Hi =$ PG DPCP: 


j=l 
which is of the form (11.25) with an idiosyncratic variance Q = 0. 


2. A PCA of the conditional variance H, should, in full generality, give H, = P,A,P/ with 


t 
factors (that is, principal components) f, = P/€+. Model (11.26) thus assumes that all factors 


are linear combinations, with fixed coefficients, of the same returns ¢€;,. For instance, the first 
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factor fıs is the conditionally most risky factor (with the largest conditional variance i1;, see 
Exercise 11.12). But since it is assumed that the direction of fıs is fixed, in the subspace of 
IR” generated by the components of €;;, the first factor is also the most risky unconditionally. 
This can be seen through the PCA of the unconditional variance H = EH, = PAP’, which 
is assumed to exist. 


n 


3. It is easy to estimate P by applying PCA to the empirical variance H = n7! Xle 
Ele —€)', where €= n7! ye €. The components of P'e, are specified as GARCH- 
type univariate models. Estimation of the conditional variance A, = PA, P’ thus reduces 
to estimating m univariate models. 


4. It is common practice to apply PCA on centered and standardized data, in order to remove 
the influence of the units of the various variables. For returns €;;, standardization does not 
seem appropriate if one wishes to retain a size effect, that is, if one expects an asset with 
a relatively large variance to have more weight in the riskier factors. 


5. In the spirit of the standard PCA, it is possible to only consider the first r principal com- 
ponents, which are the key factors of the system. The variance H, is thus approximated by 


Pdiag(Ay,,..., Arr, O) Ê’, (11.29) 


rt» “m—r 


where the 4;, are estimated from simple univariate models, such as GARCH(1, 1) models 
of the form (11.28), the matrix P is obtained from PCA of the empirical covariance matrix 
H= Pdiag(A,, ...,Am)P", and the factors are approximated by f, = P’e,. Instead of the 
approximation (11.29), one can use 


A, = Pdiag(Qy,,... drt, Artis e+ 03 Am) P’. (11.30) 


The approximation in (11.30) is as simple as (11.29) and does not require additional com- 
putations (in particular, the r GARCH equations are retained) but has the advantage of 
providing an almost surely invertible estimation of H; (for fixed n), which is required in 
the computation of certain statistics (such as the AIC-type information criteria based on the 
Gaussian log-likelihood). 


6. Note that the assumption that P is orthogonal can be restrictive. The class of gen- 
eralized orthogonal GARCH (GO-GARCH) processes assumes only that P is any 
nonsingular matrix. 


11.3 Stationarity 


In this section, we will first discuss the difficulty of establishing stationarity conditions, or the 
existence of moments, for multivariate GARCH models. For the general vector model (11.9), and in 
particular for the BEKK model, there exist sufficient stationarity conditions. The stationary solution 
being nonexplicit, we propose an algorithm that converges, under certain assumptions, to the 
stationary solution. We will then see that the problem is much simpler for the CCC model (11.18). 


11.3.1 Stationarity of VEC and BEKK Models 


It is not possible to provide stationary solutions, in explicit form, for the general VEC model (11.9). 
To illustrate the difficulty, recall that a univariate ARCH(1) model admits a solution €; = opn, with 
o explicitly given as a function of {7,_,,, u > 0} as the square root of 


o? = o +an ok, = ofl +an? +a? + I, 
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provided that the series converges almost surely. Now consider a bivariate model of the form 
(11.6) with H, = h + we;_\€{_,, where œ is assumed, for the sake of simplicity, to be scalar and 


positive. Also choose H” * to be lower triangular so as to have (11.7). Then 


wd 
Aig = Ltoahye-iny,_y 
2 2 1/2 
Aye = ahy- +A (hate—1ho2t—-1 — Aggy) © mi—120-1 
h h h —h? 
= je. 12,t—-1 „2 11,f—1722,t-1—"19,1-1 „2 
har = 1l Ohiri L-1 +e hij t-1 ,1-1 


7 1/2 
hy2,t-1 (hii -ih221-1 zhas) 
Les 


Teal 1,2-172,1-1- 


It can be seen that, given 7,1, the relationship between h,,, and hy; ;—; is linear, and can be 


iterated to yield 
oo i 
Au, = 1+ ya I] Mj 


i=l j=l 


under the constraint a < exp(—E log Hi): In contrast, the relationships between /12,+, or h22,,, and 
the components of H;—; are not linear, which makes it impossible to express h12, and h22, as a 
simple function of a, {7;—1, 7;-2,---, N-k} and H;_, for k > 1. This constitutes a major obstacle 
for determining sufficient stationarity conditions. 


Remark 11.5 (Stationarity does not follow from the ARMA model) Similar to (11.22), let- 
ting v; = vech(e;e/) — vech(H;), we obtain the ARMA representation 


r p 
vech(€,€,) = w + > Cvech(e,_j€/_;) +v — 2 BY vj, 


i=l j=l 


by setting C = A® + B® and by using the usual notation and conventions. In the literature, 
one may encounter the argument that the model is weakly stationary if the polynomial z bh 
det (Zs — )0;_, Cz’) has all its roots outside the unit circle (s = m(m + 1)/2). Although the 
result is certainly true with additional assumptions on the noise density (see Theorem 11.5 and the 
subsequent discussion), the argument is not correct since 


r P 


=f 
vech(e,€,) = (>: cn w+ v — >D BY v; 


i=l j=l 


constitutes a solution only if v, = vech(e,e/) — vech(H;) can be expressed as a function of 
{Nr-u, u > 0}. 


Boussama (2006) obtained the following stationarity condition. Recall that o (A) denotes the spec- 
tral radius of a square matrix A. 


Theorem 11.5 (Stationarity and ergodicity) There exists a strictly stationary and nonanticipa- 
tive solution of the vector GARCH model (11.9), if: 


(i) the positivity condition (11.15) is satisfied; 


(ii) the distribution of n has a density, positive on a neighborhood of 0, with respect to the 
Lebesgue measure on R; 


(iii) p QD, C®) <1. 


This solution is unique, B-mixing and ergodic. 
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In the particular case of the BEKK model (11.21), condition (iii) takes the form 


K f 
p (v; 2 die @ Aik + Bik ® Bi)? <i. 
k=1 i=l 

The proof of Theorem 11.5 relies on sophisticated algebraic tools. Assumption (ii) is a standard 
technical condition for showing the 6-mixing property (but is of no use for stationarity). Note that 
condition (iii), written as )*;_, a; + B; < 1 in the univariate case, is generally not necessary for 
the strict stationarity. 

This theorem does not provide explicit stationary solutions, that is, a relationship between €; 
and the 7;—;. However, it is possible to construct an algorithm which, when it converges, allows 
a stationary solution to the vector GARCH model (11.9) to be defined. 


Construction of a stationary solution 


For any t,k € Z, we define 
("=H =0,  whenk <0, 


and, recursively on k > 0, 


vech(H,* D) = w+ Aet Me D) 4 Y BO vech Hs Dy, (11.31) 
i=l jal 


with = HY 25 
Observe that, a k = |. 


k k 
HP = Aiea oume) and BAP = H®, 


where fy is a measurable function and H“ is a square matrix. (HY yi and (ef D), are thus 
stationary processes whose components take values in the Banach space L? of the (equivalence 
classes of) square integrable random variables. It is then clear that (11.9) admits a strictly stationary 
solution, which is nonanticipative and ergodic, if, for all t, 


HP converges almost surely when k —> œœ. (11.32) 


Indeed, letting HY r limg—+o0 HO" > and & = Hy! EA and taking the limit of each side of 


(11.31), we note that (11.9) is satisfied. Moreover, (€,) constitutes a strictly stationary and nonan- 
ticipative solution, because €; is a measurable function of {ņ„, u < t}. In view of Theorem A.1, 
such a process is also ergodic. Note also that if H, exists, it is symmetric and positive definite 
because the matrices HY” are symmetric and satisfy 


NHMA>NOQA>O, for dA #0. 
This solution (€;) is also second-order stationary if 
HY converges in L! when k —> oo. (11.33) 


Let g i i 
(k) ) (k=1) 
A® = vech | H® — H! . 


From Exercise 11.8 and its proof, we obtain (11.32), and hence the existence of strictly stationary 
solution to the vector GARCH equation (11.9), if there exists o €]0, 1[ such that aI] = O(p*) 
almost surely as k — oo, which is equivalent to 


1 
lim —log JA || <0, as. (11.34) 
k> k 
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Similarly, we obtain (11.33) if EAM | = O(p*). The criterion in (11.34) is not very explicit 
but the left-hand side of the inequality can be evaluated by simulation, just as for a Lyapunov 
coefficient. 


11.3.2 Stationarity of the CCC Model 
In model (11.18), letting 7, = R!/2n,, we get 


2 
i, 0 0 
O° 
€ = Y;h,, where Y; = 
Oo wes ey 


Multiplying by Y; the equation for h,, we thus have 
q P 
6 = Tot) TAic +) TBjh j, 
i=l j=l 


which can be written 


Z, = b, + AZ (11.35) 
where 
To e 
—t 
b, = b(n) = wv = Reta Z; = ak E Rut | 
0 oe 
0 h, p41 
and 
T,Aı YiAg Y,Bı YB, 
Im 0 0 0 0 
0 In 0 0 0 
0 Im 0 0 š 0 0 
TE (11.36) 
Ay Ag B, B, 
0 In 0 0 
0 0 Tn 0 
0 = 0 0 0 tee Im 0 


isa(p+q)m x (p + q)m matrix. 
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We obtain a vector representation, analogous to (2.16) obtained in the univariate case. This 
allows to state the following result. 


Theorem 11.6 (Strict stationarity of the CCC model) A necessary and sufficient condition for 
the existence of a strictly stationary and nonanticipative solution process for model (11.18) is y < 0, 
where y is the top Lyapunov exponent of the sequence {A;,t € Z} defined in (11.36). This stationary 
and nonanticipative solution, when y < 0, is unique and ergodic. 


Proof. The proof is similar to that of Theorem 2.4. The variables n, admitting a variance, the 
condition E logt || A;|| < œ is satisfied. 
It follows that when y < 0, the series 


CO 
Z, =b, + YO AAi- Andy na (11.37) 
n=0 


converges almost surely for all ¢. A strictly stationary solution to model (11.18) is obtained as 
= {diag (Z 41, R" n, where Z a denotes the (q + 1)th subvector of size m of Z,. This 
solution is thus nonanticipative and ergodic. The proof of the uniqueness is exactly the same as in 
the univariate case. 
The proof of the necessary part can also be easily adapted. From Lemma 2.1, it is sufficient 
to prove that lim;_,.o ||Ag... A_;|| = 0. It suffices to show that, for 1 <i < p+q, 
lim Ao... A-re; =0, as., (11.38) 


t>0oo 


where e, = e; ® Im and e; is the ith element of the canonical basis of R?*4, since any vector x 
of R”(P+4) can be uniquely decomposed as x = a e;Xi, where x; € R”. As in the univariate 
case, the existence of a strictly stationary solution implies that Ag... A_,b_,_, tends to 0, almost 
surely, as k — oo. It follows that, using the relation b_,_; = e; Y-x-1% + 25412, We have 

lim Ao... Axe, Y-z-19 = 0, lim Ao... A Ke, 4,0 = 9, a.s. (11.39) 


k->0oo k->0oo 


Since the components of œ are strictly positive, (11.38) thus holds for i = q + 1. Using 
A-Kg4i = Y_,Bie, + Bieg +1 + êg i= les P (11.40) 
with the convention that e pti = 0, for i = 1 we obtain 
0 = lim Ao... A-Keg41 > jim âo... A-k+12442 > 0, 


where the inequalities are taken componentwise. Therefore, (11.38) holds true for i = q + 2, and by 
induction, fori = q + j, j =1,..., p in view of (11.40). Moreover, since A-key = Y_rAge, + 
Age, 4 > (11.38) holds for i = q. We reach the same conclusion for the other values of i using an 
ascending recursion, as in the univariate case. 


The following result provides a necessary strict stationarity condition which is simple to check. 


Corollary 11.1 (Consequence of the strict stationarity) Let y denote the top Lyapunov expo- 
nent of the sequence {A,,t € Z} defined in (11.36). Consider the matrix polynomial defined by: 
B@) = Im — 2B, = . . . — 2" By, z € C. Let 


B, B2 B, 
Im 0 0 
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Then, if y < 0 the following equivalent properties hold: 


1. The roots of det B(z) are outside the unit disk. 


2. p(B) < 1. 


Proof. Because all the entries of the matrices A; are positive, it is clear that y is larger than 
the top Lyapunov exponent of the sequence (A¥) obtained by replacing the matrices A; by 0 in 
A,. It is easily seen that the top Lyapunov coefficient of (Af) coincides with that of the constant 
sequence equal to B, that is, with p(B). It follows that y > log p(B). Hence y < 0 entails that all 
the eigenvalues of B are outside the unit disk. Finally, in view of Exercise 11.14, the equivalence 
between the two properties follows from 


det(B — AImp) = (—1)”? det {AP In — A?~'By — ++» — ABy_1 — Bp} 


= (-A)'"? det B G) : A#0. 


Corollary 11.2 Suppose that y < 0. Let €; be the strictly stationary and nonanticipative solution 
of model (11.18). There exists s > 0 such that E||h,||* < œ and Elle; l|? < 00. 


Proof. It is shown in the proof of Corollary 2.3 that the strictly stationary solution defined 
by (11.37) satisfies £||ž, || < oo for some s >0. The conclusion follows from |le,|| < ||Z,|] and 
A.M < UZ, Il. 


11.4 Estimation of the CCC Model 


We now turn to the estimation of the m-dimensional CCC-GARCH(p, q) model by the quasi- 
maximum likelihood method. Recall that (€+) is called a CCC-GARCH(p, q) if it satisfies 


& = H; m, 
H, = D,RD,, D? = diag(h,), 
q P (11.41) 
2 2 
h, = oF yo Aig; T $ Bjh,_;, E= (GA ETa Ena) $ 
i=1 j=1 


where R is a correlation matrix, œ is a vector of size m x 1 with strictly positive coefficients, the 
A; and B; are matrices of size m x m with positive coefficients, and (n,) is an iid sequence of 
centered variables in R” with identity covariance matrix. 

As in the univariate case, the criterion is written as if the iid process were Gaussian. 

The parameters are the coefficients of the matrices w, A; and Bj, and the coefficients of the 
lower triangular part (excluding the diagonal) of the correlation matrix R = (p;;). The number of 
unknown parameters is thus 


59 =m+m*(p+q)4 
The parameter vector is denoted by 


0 = (Or, -ees 59)’ = (@',a},-.-,@,, By,---, By PY = a’, BY, oY, 
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where p = (p21, sees Pml, P32, » -+ Pm2,-+-, Pm,m-1), a= vec(A;), ae eee q, and Bj = 
vec(B;), j = 1,..., p. The parameter space is a subspace © of 


]0, +oo[” x [0, ol” PF) x] =f; Pane, 
The true parameter valued is denoted by 
o = (hs ys 25 gs Bors ++ Boys PO = (hy a, Bh, 26)! 


Before detailing the estimation procedure and its properties, we discuss the conditions that need 
to be imposed on the matrices A; and B; in order to ensure the uniqueness of the parameterization. 


11.4.1 Identifiability Conditions 


Let Ag(z) = $}; Az! and Bg(z) = Im — wy Bj;z/. By convention, Ag(z) = 0 if q = 0 and 
Bo(z) = In if p =0. 

If Bg(z) is nonsingular, that is, if the roots of det (Bg(z)) = 0 are outside the unit disk, we 
deduce from By(B)h, = œ + Ao(B)e, the representation 


h, = Bo(1)~'@ + Bo (B) ' Ag (B)e,. (11.42) 


In the vector case, assuming that the polynomials Abo and By, have no common root is insufficient 
to ensure that there exists no other pair (Ag, By), with the same degrees (p,q), such that 


Bo(B)~|Ag(B) = Bay (B)! Ao (B). (11.43) 
This condition is equivalent to the existence of an operator U(B) such that 
Ao(B) = U(B)Ag,(B) and Be(B) = U(B)Bo,(B), (11.44) 
this common factor vanishing in Byg(B)~!Ag(B) (Exercise 11.2). 
The polynomial U (B) is called unimodular if det{U(B)} is a nonzero constant. When the only 
common factors of the polynomials P(B) and Q(B) are unimodular, that is, when 
P(B) =U(B)P\(B), Q(B) = U(B)Qi(B) => det{U(B)} = constant, 
then P(B) and Q(B) are called left coprime. 
The following example shows that, in the vector case, assuming that A,,(B) and Bg,(B) are 


left coprime is insufficient to ensure that (11.43) has no solution 0 Æ 69 (in the univariate case this 
is sufficient because the condition By (0) = Ba (0) = 1 imposes U(B) = U (0) = 1). 


Example 11.4 (Nonidentifiable bivariate model) For m = 2, let 


_ (ù an(B) an(B) _ ( bu(B) bn(B) 
AD=( am ante) )> aD a poe) )> 


u®=(; i: 


deg(a21) = deg(ax2) =q, deg(aıı) <q, deg(ai2) < q 


with 


and 
deg(b21) = deg(ba2) = p, deg(bi1) < p, deg(bi2) < p. 
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The polynomial A(B) = U(B)Ag (B) has the same degree q as Ag, (B), and B(B) = U (B)Bo (B) 
is a polynomial of the same degree p as Bg, (B). On the other hand, U (B) has a nonzero determinant 
which is independent of B, hence it is unimodular. Moreover, B(0) = Bg, (0) = Im and A(0) = 
Ao (0) = 0. It is thus possible to find @ such that B(B) = Ba (B), A(B) = Ao (B) and w = U(1)a@o. 
The model is thus nonidentifiable, 0 and 6) corresponding to the same representation (11.42). 

Identifiability can be ensured by several types of conditions; see Reinsel (1997, pp. 37-40), 
Hannan (1976) or Hannan and Deistler (1988, sec. 2.7). To obtain a mild condition define, for 
any column i of the matrix operators Ag(B) and Bg(B), the maximal degrees q;(0) and p;(@), 
respectively. Suppose that maximal values are imposed for these orders, that is, 


vð € ©, Vi=1,...,.m, gi(@)<q and p;(@) < pi, (11.45) 


where q; < q and p; < p are fixed integers. Denote by aq; (i) (bp; (i)) the column vector of the 
coefficients of B% (BPi) in the ith column of Ap, (B) (Bo (B)). 


Example 11.5 (Illustration of the notation on an example) For 


= 1+a,B? ay2B _ 1+ d,,B* bi2B 
An (B) = ( a> B? + a3,B 1+a2B i Bo (B) 7 bo B4 1+ bn B i 


with a11421412422b11b21b12b22 A 0, we have 


qi(6) =2, q2(%)=1, pilo)=4, p2(%) =1 


aii ai2 bii bi2 
ae ( azı ). in ( an2 ). a ( boy L Ga ( bn ) 


Proposition 11.1 (A simple identifiability condition) Zf the matrix 


and 


M (Aw; Bo) = Lag, (1)--- Gan (m) bp (1)--- Dom (m)] (11.46) 


has full rank m, the parameters ao and Bo are identified by the constraints (11.45) with qi = qi (00) 
and pi = pi(0o) for any value of i. 


Proof. From the proof of the theorem in Hannan (1969), U(B) satisfying (11.44) is a unimodular 
matrix of the form U(B) = Uo + U,;B+...+U;,B*. Since the term of highest degree (column 
by column) of Ag, (B) is [aqg,(1)B” -- - ag,,(m)B ], the ith column of Ag(B) = U (B)Aa (B) is 
a polynomial in B of degree less than q; if and only if Ujag,(i) = 0, for j = 1,..., k. Similarly, 
we must have U;b,, (i) = 0, for j = 1,...,k andi = 1,...m. It follows that U; M (Ae, Boy) = 0, 
which implies that U; = 0 for j = 1,..., k thanks to condition (11.46). Consequently U (B) = Uo 
and, since, for all 6, Bg (0) = Im, we have U(B) = In. 


Example 11.6 (Illustration of the identifiability condition) In Example 11.4, 


M (Ary Bay) = la Du Db Db = | sa o] 


x x x x 


is not a full-rank matrix. Hence, the identifiability condition of Proposition 11.1 is not satisfied. 
Indeed, the model is not identifiable. 


A simpler, but more restrictive, condition is obtained by imposing the requirement that 


Mı (Ao: Ba) = [Ag Bp] 
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has full rank m. This entails uniqueness under the constraint that the degrees of Ag and Bọ are 
less than q and p, respectively. 


Example 11.7 (Another illustration of the identifiability condition) Turning again to Example 
11.5 with ai2b21 = a22b1; and, for instance, a2; = 0 and a2 Æ 0, observe that the matrix 


0O ay dy J 


MiA Ba) =| o an ba O0 


does not have full rank, but the matrix 


b b 
M (An, Bo) — | a1 412 11 12 l 


0 an bu bn 


does have full rank. 


More restrictive forms, such as the echelon form, are sometimes required to ensure identifiability. 


11.4.2 Asymptotic Properties of the QMLE 
of the CCC-GARCH model 


Let (€1,...,€n) be an observation of length n of the unique nonanticipative and strictly 
stationary solution (€,) of model (11.41). Conditionally on nonnegative initial values 


Ores Cogs Ngsecua hy_ps the Gaussian quasi-likelihood is written as 
Ln(O) = Ln (0; = Il l l ‘AO! 
n(0) = Ln (0; €1,-.-,€n) = ny" g pa P zé 1 &t)> 


t=1 
where the A, are recursively defined, for t > 1, by 


A, = D,RD,, Dd, = {diag(h,)}'/? 
h, = h,0@)=0+ VL, Aig; + UF Bih 


A QMLE of 0 is defined as any measurable solution Ê, such that 


6, = arg max L, (0) = arg minl, (9), (11.47) 
(ZS) S] 


where 


1@)=n' S04, G = iO) = e Aye + log| Al. 


Remark 11.6 (Choice of initial values) It will be shown later that, as in the univariate case, the 
initial values have no influence on the asymptotic properties of the estimator. These initial values 
can be fixed, for instance, so that 


Shag Sa 0. 


They can also be chosen as functions of 0, such as 


€p ET ity h; EE hip O, 
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or as random variable functions of the observations, such as 


Eir 
h,=€,= : , t=0,-1, ,l-r, 
2 
emt 
where the first r = max{ p, q} observations are denoted by €_,,..., €0. 


Let y (Ao) denote the top Lyapunov coefficient of the sequence of matrices Ap = (Aor) defined as 
in (11.36), at 6 = 6. The following assumptions will be used to establish the strong consistency 
of the QMLE. 


A1: o € © and © is compact. 

A2: y(Ao) < 0 and, for all 0 € ©, detB(z) = 0 => |z| > 1. 

A3: The components of n; are independent and their squares have nondegenerate distributions. 
A4: If p>0, then Ag, (z) and Bg,(z) are left coprime and M; (Ao, Bog) has full rank m. 


A5: R is a positive definite correlation matrix for all 6 € ©. 


If the space © is constrained by (11.45), that is, if maximal orders are imposed for each component 
of €, and h, in each equation, then assumption A4 can be replaced by the following more general 
condition: 


Ad: If p> 0, then Ag (z) and Bg, (z) are left coprime and M(Ag,, Bog) has full rank m. 
It will be useful to approximate the sequence (;(9)) by an ergodic and stationary sequence. 


Assumption A2 implies that, for all 0 € ©, the roots of Bg(z) are outside the unit disk. Denote by 
(h), = {h,(0)} , the strictly stationary, nonanticipative and ergodic solution of 


q p 
h,=o+) Aici t) Bih yi (11.48) 
i=l j=l 
Now, letting D, = {diag(h,)}'/? and H, = D,RD,, we define 


1,9) =O; €n, €n-1 ee) =A Db, le = (0) = ej Hye + log |H. 


t=1 


We are now in a position to state the following consistency theorem. 


Theorem 11.7 (Strong consistency) Let (6,) be a sequence of QMLEs satisfying (11.47). Then, 
under A1—A5 (or A1—A3, A4 and A5), 


6, > 0, almost surely when n > œœ. 


To establish the asymptotic normality we require the following additional assumptions: 


A6: 45 cO, where O is the interior of ©. 


A7: Elini? < 00. 
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Theorem 11.8 (Asymptotic normality) Under the assumptions of Theorem 11.7 and A6—A7, 
Vn, — 09) converges in distribution to N(O, J~'IJ~'), where J isa positive definite matrix and 
I is a positive semi-definite matrix, defined by 


pag (LLW gp (PLW) 
00 00’ 0000’ 


11.4.3 Proof of the Consistency and the Asymptotic Normality 
of the QML 


We shall use the multiplicative norm (see Exercises 11.5 and 11.6) defined by 


|All := sup Axl] = p'/(4’A), (11.49) 


xsl 


where A is a dı x d) matrix, ||x|| is the Euclidean norm of vector x € R®, and p(-) denotes the 
spectral radius. This norm satisfies, for any d2 x dı matrix B, 


IAI? < $a}; = TA'A) < bJ Al?, 1AA] < NAN, (11.50) 
ij 


1/2 1/2 


ITr(AB)| < | a7; Xoo] < {ddi} P ANIBIl. (11.51) 
i ij 


Proof of Theorem 11.7 


The proof is similar to that of Theorem 7.1 for the univariate case. 
Rewrite (11.48) in matrix form as 


H; = c, + BHi-1, (11.52) 
where B is defined in Corollary 11.1 and 
q 
h, o+) Ag; 
h, i=l 
H, = i n GE 0 ; (11.53) 
h, p41 i 


We shall establish the intermediate results (a), (c) and (d) which are stated as in the univariate 
case (see Section 7.4 of Chapter 7), result (b) being replaced by 


(by {h (0) = h, (0) Pa a.s. and R(0) = R(%)} => 6 = bo. 


Proof of (a): initial values are asymptotically irrelevant. In view of assumption A2 and 
Corollary 11.1, we have p(B) < 1. By the compactness of ©, we even have 


sup po (B) < 1. (11.54) 
060 


Iteratively using equation (11.52), as in the univariate case, we deduce that almost surely 


sup |H, —H,|| < Kp", Yt, (11.55) 
dcO 


where H, denotes the vector obtained by replacing the variables h,_; by hy; in H,. Observe that 
K is a random variable that depends on the past values {€;,t < 0}. Since K does not depend on 


n, it can be considered as a constant, such as p. From (11.55) we deduce that, almost surely, 
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sup |H; — H,|| < Kp’, Yt. (11.56) 
660 


Noting that ||R~!|| is the inverse of the eigenvalue of smallest modulus of R, and that || D> Wa 
{min;(ji1)}~', we have 


sup || 7" < sup || Õ7' ŽIRI < sup{minw(i)}~7||R“' || < K, (11.57) 
0€O cO 0cO ! 


using A5, the compactness of © and the strict positivity of the components of w. Similarly, 
we have 


sup ||H;"'|| < K. (11.58) 
060 
Now 
sup Iln (0) ee ln (0)| 
i) 
<n!) sup |e (H7 ' — Ahel +n EY sup [tog — tog). (11.59) 
ES) 


t= 1° 


The first sum can be written as 
n 


n`! > sup |e; H,'(H, — H,)H,'e,| 
t=] 9€9 


aS sup | Tr {ce}, '(H, — H,)H, ‘e,}| 
f= 1 PEO 


iy is) Tr (Ar ‘CA, — ADH, aA 
‘= | 269 


n 


< Kn! $ sup Ap! WA — All| Ay Mee 


f= 1 OEO 


n 
< Kn} olee; 


t=1 


—>0 


as n— oo, using (11.51), (11.56), (11.57), (11.58), the Cesaro lemma and the fact that 
p'llerel || = ptele > 0 a.s. Now, using (11.50), the triangle inequality and, for x >—1, 
log(1 + x) < x, we have 


log |H,| — log |A;| = log |Im + (H, — Ay) A; "| 
< mlog || Im + (H; — ADA; 
< mlog(lIm|| + ICH; — ÅD ÁT ID 
< mlog(1 + ||(H; — Ay) Ay") 
< m|| H; — AAS I, 


> The latter statement can be shown by using the Borel—Cantelli lemma, the Markov inequality and by 
applying Corollary 11.2: 


oo CO: st oo pt 2s 
E eje) Elle, 
y Pideadey UN S eee l l 


t=1 t=1 t= 
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and, by symmetry, 
log |H,| — log |H;| < m|| H, — A, ||| H,"' |). 


Using (11.56), (11.57) and (11.58) again, we deduce that the second sum in (11.59) tends to 0. 
We have thus shown that almost surely as n > ~, 


sup [y (8) Car i, ()| —> 0. 
dcO 


Proof of (b)’: identifiability of the parameter. Suppose that, for some 6 Æ 4, 
h,(0) =h,(60), Po-a.s. and R(@) = R(6o). 


Then it readily follows that ọ = po and, using the invertibility of the polynomial Bg(B) under 
assumption A2, by (11.42), 


Bo(1)~'@ + Ba (B) Ao (Bye, = Ba (1)~'@p + Bo, (B) Aa (Be, 
that is, 
Ba) w — Be) '@) = {Ba (B) "Ag (B) — Bo(B)~' Ao(B)}e, 
= P(B)e, as. Vt. 


Let P(B) = yo P; B'. Noting that Py = P(0) = 0 and isolating the terms that are functions 
of 1-1, 
Pi (hin-i e.’ ETEEN eee = Zi-2, a.S., 


where Z;—2 belongs to the o-field generated by {n;—2, ;-3, ...}. Since 7;_; is independent of this 
o-field, Exercise 11.3 shows that the latter equality contradicts A3 unless, for i, j =1,...,m, 
Pijhjj = 0 almost surely, where the p;; are the entries of Pı. Because Ajj s >0 for all j, we 
thus have Pı = 0. Similarly, we show that P(B) = 0 by successively considering the past values 
of n,-1. Therefore, in view of A4 (or A4’), we have œ = a and B = fo (see Section 11.4.1). It 
readily follows that w = wy. Hence 6 = 0. We have thus established (b)’. 


Proof of (c): the limit criterion is minimized at the true value. As in the univariate case, we 
first show that Eg, €;(@) is well defined on R U {+00} for all 6, and on R for 6 = 6. We have 


Em; (0) < Ea log™ |H;| < max{0, —log(|R| min @(i)")} < o0. 
At 0o, Jensen’s inequality, the second inequality in (11.50) and Corollary 11.2 entail that 


m 
Eo log | H; (80)| = Eom log |H; (Qo) |5/" 


IA 


m S m 5 2s 
ve Ea || Hi (0) |" < = los Ev RII" ||D: @o) II" 


IA 


m 2s 
K+ ~ log Eo || Dr (60) |" 


are log Eg, (max hji,,(90))° 
E i 


5/2 
m 
< K +— log E h? (0 
= x+ Biogen [Zio] 


L 


ll 
> 


m 
F — log Eey||2, G0) |" < 00. 


MULTIVARIATE GARCH PROCESSES 299 


It follows that 


Bays (80) = Eon [ni H; (00) Hy Go)! He 80)": + log |H: (60) } 
= m + Eg log |H; (80)| < oo. 
Because Ea,£; (09) < 00, the existence of Ee,£: (8o) in R holds. It is thus not restrictive to study 


the minimum of E,,¢;(9) for the values of O such that Eg,|€;(@)| < oo. Denoting by Aj;,;, the 
positive eigenvalues of H, (0) H, (0) (see Exercise 11.15), we have 


Eol: (0) — Eol: (80) 
| H,(0)| 
|H; (0) 
= Ey, log{|H,(0)H, | (60) |} 


= Eq log (6) H"(6) Hy!” 


+ Ee {niLH, (00) — Inne} 


1/2 


+ Tr (Eo, [LH; (60)' Hy 18) Ho) — In}} EGnm)) 


= Eo, log{|H,(0)H,' ()|} + Eo, (Tr {LH (0) H;'(@) — In]}) 


= Evy [Eo -1 -wga >0 


j= 


because logx < x — 1 for all x >0. Since logx = x — 1 if and only if x = 1, the inequality is 
strict unless if, for all i, A; = 1 Pop-a.s., that is, if H;(@) = A; (0o), Poy-a.s. (by Exercise 11.15). 
This equality is equivalent to 


h,(@) = h, (80), Pog-a.s., R(@) = R(80) 


and thus to 6 = 69, from (bY. 


Proof of (d). The last part of the proof of the consistency uses the compactness of © and the 
ergodicity of (¢,;(@)), as in the univariate case. 
Theorem 11.7 is thus established. 


Proof of Theorem 11.8 

We start by stating a few elementary results on the differentiation of expressions involving matrices. 
If f(A) is a real-valued function of a matrix A whose entries a;; are functions of some variable 
x, the chain rule for differentiation of compositions of functions states that 


af(A df (A) dai; af (A) dA 
a FCA) dai; | TA | (11.60) 
Ox ðaij Ox dA’ Ox 
Moreover, for A invertible we have 
dc’ Ac 
DA — cc’, (11.61) 
dTr(C A’ BA’ 
SCBA, = C'AB' + B'AC', (11.62) 


0A’ 
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ð log |det(A)| = 95 


L l (11.63) 
> _ Sa (11.64) 

A Beg (11.65) 
ed = BG (11.66) 


(a) First derivative of the criterion. Applying (11.60) and (11.61), then (11.62), (11.63) and 
(11.64), we obtain 


34, (0 əƏD7'R-! D7! Ə log |detD;| 
= T es ee E 2 
00; i (< 00; j 30; 


aD 
= -Tr Ca + R' D} 'e,e!) De! pr’) 
i 


aD 
42Tr (r 2), (11.67) 


for i = 1,...,sı =m + (p +q)m?, and using (11.65), 


3L (0) 
00; 


Hine _1 p-19R _,9R 
-r(x lD- leelD7'R a a(r EI (11.68) 


for i = sı + 1, ..., so. Letting Dor = D: (60), Ro = R(80), 


Dy = — (0), Ry) = E (60). Dy” = = (0), RS? = a (60). 
and fj, = R'/*n,, the score vector is written as 
— = Tr | (Im — R'A) DE Da + (Im — fie) Dg DY} (11.69) 
fori = I; s and 
— = Tef (In = R'A) BoR (11.70) 
fori = sı +1,..., So. 


(b) Existence of moments of any order for the score. In view of (11.51) and the 
Cauchy—Schwarz inequality, we obtain 
\" 


2 r 
-1 phi) 
E | Do Do 


E ƏL, (80) 3L; (Ao) 
30; 30; 


<K [e 51D? 


ford, J = Myc. 65815 


-1 ph 
< KE|D,' Dj, 


d 


ƏL: (O0) 3L: (80) 
3d 30; 
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fori =1,...,5, and j = sı + 1,..., So, and 


E ƏLi (00) 3L: (80) 
3A; 30; 


for i, j = sı +1,..., sọ. Note also that 


To show that the score admits a second-order moment, it is thus sufficient to prove that 


1 əh, (i1) 
h,(ii) 96; 


for alli; = 1,...,m, alli =1,..., 5, and ro = 2. By (11.52) and (11.54), 


OH; . 
sup < 00, i=l,...,m, 
peo || 96; 
and, setting s2 = m + qm°?, 
oH, 
0; < H,, i=m+1 
30i 


k 
ôH, => D J—IpOR iN i=s +1 S1 
30; f tt—k? genes , 


where B® = 3B/30; is a matrix whose entries are all 0, apart from a 1 located at the same place 
as 0; in B. In an abuse of notation, we denote by H,(i;) and ho (i1) the ijth components of H, 
and h,(00). With arguments similar to those used in the univariate case, that is, the inequality 
x/(1+x) < x°* for all x > 0 and s € [0, 1], and the inequalities 


|: ers g, oH ue x oe) l 
Lx $k cae - as a “Gi, Wee) 
k=1 k=1 j=l 


and, setting œ = infi <i<m @(i), 


H(i) 2 o+ DB, Wey, Yk, 
j=l 


we obtain 


a P 2 s/ro m 
6, dH; (i) = Kin, GeV ‘frases 
< EEO M eck. k 1 i 
H, (i1) 96; = 23 oO 3 Phe k G1) 


j=l k=1 jı=1 k=1 


where the constants p;, (which also depend on i1, s and ro) belong to the interval [0, 1). Noting 


ie) 
that these inequalities are uniform on a neighborhood of 0 €@, that they can be extended to 
higher-order derivatives, as in the univariate case, and that Corollary 11.2 implies that ||c, ||; < 00, 
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we can show a stronger result than the one stated: for all i; = 1,...,m, alli, j,k =1,..., s1 and 
all ro > 0, there exists a neighborhood V(69) of 0o such that 


1 dh) |P 
E sup - (0) <œ, (11.71) 
bevo) |A) 96; 
1 07h, i 
E gip | | es (11.72) 
GeV) | 2,41) 00:90; 
and 
1 h(i o 
E sup 1 FED ig) < 00. (11.73) 
Bevon) | 2, (i1) 06; 90; 0% 


(c) Asymptotic normality of the score vector. Clearly, {0€;(6))/00}, is stationary and 
d£;(99)/00 is measurable with respect to the o-field F, generated by {n,,u < t}. From (11.69) 
and (11.70), we have E {0£;(09)/00 | F:;-1} = 0. Property (b), and in particular (11.71), ensures 
the existence of the matrix 

ƏLi (80) 3L; (80) 


TSE 
00 00’ 


It follows that, for all à € R?*9*!, the sequence { xX L (80), F,} 3 is an ergodic, stationary and 
square integrable martingale difference. Corollary A.1 entails that 


Eee) É 
ae 2 9g 160) SNO, D. 
(d) Higher-order derivatives of the criterion. Starting from (a) and applying (11.60) and (11.65) 
several times, as well as (11.66), we obtain 


342 (0) 
= Tr A Lf = lysie St, 
30:30; (c1 +c2 + c3) J 1 
where 
aD D aD aD 
=I pa=1j,a1 Ct Fal. pel Ot 21 CM taal. =t p=1 Rat Cet 
C= D; IR D; 30; D; €,€,D, 90; D; 96, P leel D; R D; 30; 
S -i -in D: 43D: e hg cy OP D) 
+D; 'eel D'R D7) T D; 20) Dee D7 RD, "36,00," 
aD aD ə D 
o = —2D7!—p-!— +2p7!—_—., 
00; 30; 30:30; 


and c3 is obtained by permuting €;€/ and R7! in cı. We also obtain 
342 (0) 
30:90; 


ae? (0) 
30:30; 


= Tr (c4 + cs), i=1,...,s1, j=s1+1,...,50, 


= Tr (c6); i, j =s +1,..., 50, 
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where 
iser D 9 ðR 
a= R" D; 6, Dy'e€;D,'R 90," 
aR aR aR aR 
=R D ee DI R =R R R D ed D R€ 
a PARR S ga gg ape a Bg, 
A. beeen R = OR OR.) OR 
=R! D7 'ere| D7 'R —R SRRA, 
30:30; 30:30; a0; 30; 


and cs is obtained by permuting €e; and 0D,/00; in c4. Results (11.71) and (11.72) ensure the 
existence of the matrix J := E 324: (00) /0000', which is invertible, as shown in (e) below. Note 
that with our parameterization, 07 R/06;30 = 0, 

Continuing the differentiations, it can be seen that 3 (0) /06; 00; 00; is also the trace of a sum 
of products of matrices similar to the c;. The integrable matrix €+€; appears at most once in each of 
these products. The other terms are, on the one hand, the bounded matrices R!, aR /00; and Dy í 
and, on the other hand, the matrices D;'dD,/00;, D7'Ə?D,/30;Ə0; and D73? D,/86;0; 30. 
From (11.71)—(11.73), the norms of the latter three matrices admit moments at any orders in the 
neighborhood of 69. This shows that 


340) 


E D 
90; 00; 00% 


OEV) 


(e) Invertibility of the matrix J. The expression for J obtained in (d), as a function of the 
partial derivatives of D, and R, is not in a convenient form for showing its invertibility. We start 
by writing J as a function of H, and of its derivatives. Starting from 


40) = EH 'e + log |H;|, 


the differentiation formulas (11.60), (11.63) and (11.65) give 


ae 
LTr Jan — H; 'eel H; ') 


oH, 
06; i 


06; 


and then, using (11.64) and (11.66), 


are ə? H, aH, aH, 
L —Tr( #7 !—— |) -Tr( a'H H! H 
30:30; 30:30; 30; a6; 


= _,0H, _, 0H; _, 0H; _, 0; 
+Tr (1 166 HH, 56, 1 30; ) + Tr (1 l 30; H; leel H; 30; 
_ _, °F; 
—Tr G leel H, E). 


From the relation Tr(A’B) = (vecA)'vecB, we deduce that 


376; (60) aii) S j , 
i (i | Fim) =Tr( Hy BPH, By =i 
100; 
where, using vec(ABC) = (C’ ® A)vecB, 


hy = veo (Hoy? Hy) Ho,'””) = (Ho: @ Ho”) d;, di = veo (Hy,’) . 
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Introducing the m? x so matrices 
h= (hı |---| hs) and d= (dı |---| ds), 


we have h = Hd with H = H” ® H Now suppose that J = Eh’h is singular. Then, there 
exists a nonzero vector ¢ € R*, such that c'Je = Ec'h'he = 0. Since c'h'he > 0 almost surely, 
we have 


ch’he = c'd'‘H’de=0 as. (11.74) 


Because H? is a positive definite matrix, with probability 1, this entails that de =0,,2 with 


m~ 


probability 1. Decompose ¢ into c = (c}, c$)" with cy € R“! and c2 € R53, where s3 = sọ — sı = 


m(m — 1)/2. Rows 1,m+1,...,m? of the equations 
SO a SO ə 
dc = 2o cigg Ho = 2 30, (Do; ® Dor) vecRo = 0,2, a.8., (11.75) 
give 
Dog —h, (00) = Om, a.s. (11.76) 


Differentiating equation (11.48) yields 


S1 a g Pp p s1 9 
azg =o" +) Ale, +) Bij +) B Cig hej 
TE j=l j=l a ma. 
where a l 
s 3 P , : i 
wo = “56,2 Aj = 2 aig B} Ci 5g, Bi 


Because (11.76) is satisfied for all t, we have 


it Di £- + Lit h,_ ; (00) = 0, 


where quantities evaluated at 0 = 0o are indexed by 0. This entails that 
q p 
h (60) = @p — w +J (Ao; — _; +) (Boj — BG;) 4, ; (80), 
jal j=l 


and finally, introducing a vector 6; whose sı first components are vec (wo — wp | Aoi — Ady | 


* 
< | Bop — Bo, > 


h (80) = h, (1) 


by choosing cy; small enough so that 6; € ©. If cy Æ 0 then 6; Æ 4. This is in contradiction to the 
identifiability of the parameter, hence cj = 0. Equations (11.75) thus become 


s0 
(Dor ® Dor) 5D G 90, —vecRp = 0,2, a.s. 
i=s;+1 p 
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Therefore, 
SO a 
5 ci — vec Ro = 0,2. 
; 00; 
i=s,+1 
Because the vectors dvecR/00;, i = sı +1,...,50, are linearly independent, the vector cz = 
WER heed bi) is nul, and thus ec = 0. This contradicts (11.74), and shows that the assumption 


that J is singular is absurd. 


(f) Asymptotic irrelevance of the initial values. First remark that (11.55) and the arguments 
used to show (11.57) and (11.58) entail that 


sup || D; — Õ < Ke’, supll Õ7'I < K, sup ||Dr'|| < K, (11.77) 
dcO dcO 6cO 
and thus 
1/2 1/2, ~ —1/2 —1/2 
sup |D; — B/? || < Ko, sup |B; '7 1 < K, supl D7? < K, 
Oco dcO Aco 
sup || DEP D7 P < KA +o), sup |B; D7 PI < KA + p’). (11.78) 
0cO Oco 


From (11.52), we have 


t—-r-1 t-r—-1 
H = J BY ,+B'"H,, f, = >>) BY, +BH,, 
k=0 k=0 


where r = max{p, q} and the tilde means that initial values are taken into account. Since č, = c 
for all t >r, we have H, — H, = B~” (H, — H,) and 


~ a ~ Se i, E . 
z Œ- H) =B- E — ñ,) + XC BBO- (H, — Ñ,) . 
i i Jz 
Thus (11.54) entails that 
a ry t 
aun aa (D; — D,)|| < Kø. (11.79) 
eG l 


Because 
D7' — Õ7' = D7’ (Ď, — D,) By! 


we thus have (11.77), implying that 


sup |(D7' — By!) < Kø, sup 
AS) BeO 


(D? = a) | < Kp’. (11.80) 


Denoting by o; (i) the i;th component of h, (0), 


m m q 


CO 
ho, (i1) = co + D X D X Aoi Gi, JDB, IDEG, kai 


k=0 j=l j=l i=l 


where cg is a strictly positive constant and, by the usual convention, the index 0 corresponding to 
quantities evaluated at 0 = 69. For a sufficiently small neighborhood V(69) of 6o, we have 


re ee Bi, j 
oi (ji; J2) 2k. -an oi J1) 


N i uN 1+, 
bevo Ai (1, J2) Bevo) BX (G1, ji) 
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for all i1, ji, j2 € {1,..., m} and all ô > 0. Moreover, in h,(i,), the coefficient of BK (ii, WEG, tai 
is bounded below by a constant c > 0 uniformly on 6 € V(69). We thus have 


ho, (i1) o h g A (Bi, Je aks 
stih <K+K j 
ith) = LALL rer eae 


k=0 j=l j=l i=l 


ee 
BK (it, JE}, pki 


m q 


CO 
<K+K > X Ya + DE aE 


jh=1 i=1 k=0 


for some p € [0, 1), all ô > 0 and all s € [0, 1]. Corollary 11.2 then implies that, for all rọ > 0, 


ho (i) | 
E sup foi) 
bevo) | 2, 41) 
From this we deduce that 
—1/2 —1/2 ,1/2~ 
E sup |D; el? =£ sup ||D; D il? < 00, (11.81) 
OEV) OEV) 
~ —1/2 t -1/2 
sup ||D; “el < d+ Kp’) sup |D, “ell. (11.82) 
GEV) OEV) 


The last inequality follows from (11.77) because 
D Pe _ D” (a? = D,'”) ps = D Pe. 
By (11.67) and (11.68), 


3L, (0) 30 
30; 30; 


= Tr(cı + co + c3), 


where 


_ A - AiD 
Tea a (Dp! — By") a DP, 


I 


_ 2 5 aD, aD = 
c2 =-D; Pee D'R Dp ( 2, i) BS 
L t 


and c3 contains terms which can be handled as cı and c2. Using (11.77)-(11.82), the 
Cauchy—Schwarz inequality, and 


2 


-1/20D; 
E sup |D, 12 ay Ye oO, 
BEV) 96; 
which follows from (11.71), we obtain 
3l (0 al, (0 
: © LO] E koru, 
Bevo) | 98; 06; 


where u; is an integrable variable. From the Markov inequality, nt/2 Ai p'u; = op(1), which 
implies that 


in Ta [3h 3E) 
1/2 t i > 
pre [Pee || =o 
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We have in fact shown that this convergence is uniform on a neighborhood of 0, but this is 
of no direct use for what follows. By exactly the same arguments, 


0°€,(0) PEO) 
6;00; 30:30; 


< Kp'u* 


= t? 


OEV(A) 


where u; is an integrable random variable, which entails that 


n 


a7£,(0) a?(0) 
0006’ = 0006" 


n-!/2 


si = Op (n`!) = op (1). 
a1 PEV) 


It now suffices to observe that the analogs of steps (a)—(f) in Section 7.4 have been verified, 
and we are done. 
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posed, in particular, by Tse (2002), Duchesne and Lalancette (2003). 

Bardet and Wintenberger (2009) established the strong consistency and asymptotic normality 
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Among models not studied in this book are the spline GARCH models in which the volatility is 
written as a product of a slowly varying deterministic component and a GARCH-type component. 
These models were introduced by Engle and Rangel (2008), and their multivariate generalization 
is due to Hafner and Linton (2010). 


11.6 Exercises 


11.1 


11.2 


11.3 


11.4 


11.5 


11.6 


11.7 


(More or less parsimonious representations) 
Compare the number of parameters of the various GARCH(p, q) representations, as a func- 
tion of the dimension m. 


(Identifiability of a matrix rational fraction) 
Let Ag(z), Bo(z), Ag (z) and Bg, (z) denote square matrices of polynomials. Show that 


Bo(z)~' Ag (z) = Bey (z) Ag (2) (11.83) 


for all z such that det Bg (z) Ba, (z) # 0 if and only if there exists an operator U (z) such that 


Ao(z) = U(z)Aay(z) and = Ba(z) = U(z)Ba (2). (11.84) 


(Two independent nondegenerate random variables cannot be equal) 
Let X and Y be two independent real random variables such that Y = X almost surely. We 
aim to prove that X and Y are almost surely constant. 


1. Suppose that Var(X) exists. Compute Var(X) and show the stated result in this case. 


2. Suppose that X is discrete and P(X = x1ı)P (X = x2) 4 0. Show that necessarily x; = x2 
and show the result in this case. 


3. Prove the result in the general case. 


(Duplication and elimination) 
Consider the duplication matrix D,, and the elimination matrix Di defined by 


vec(A) = D,,vech(A), vech(A) = D} vec(A), 
where A is any symmetric m x m matrix. Show that 
Di Dm = £m(m+1)/2- 


(Norm and spectral radius) 
Show that 
IAI := sup ||Ax|| = p"? (A'A). 


lxl<1 


(Elementary results on matrix norms) 
Show the equalities and inequalities of (11.50)—(11.51). 


(Scalar GARCH) 
The scalar GARCH model has a volatility of the form 


q P 
H = Q+ So aie-i€_j F XO BB 
i=l j=l 


11.8 


11.9 


11.10 


11.11 


11.12 
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where the œ; and £; are positive numbers. Give the positivity and second-order stationarity 
conditions. 


(Condition for the L? and almost sure convergence) 
Let p € [1, oo[ and let (un) be a sequence of real random variables of L? such that 


E |Un = Un—1|? < Cp", 
for some positive constant C, and some constant p in ]0, 1[. Prove that 
Un converges almost surely and in L? 


to some random variable u of LP. 


(An average of correlation matrices is a correlation matrix) 
Let R and Q be two correlation matrices of the same size and let p € [0, 1]. Show that 
pR+( — p)Q is a correlation matrix. 


(Factors as linear combinations of individual returns) 
Consider the factor model 


Var (ere; | €u, u < t) = Q +S) àjBb;jbi, 
j=l 


where the £, are linearly independent. Show there exist vectors œj such that 
r 
Var (ere; | €u, u < t) = Q* + > AB Biss 
j=l 


where the rN, are conditional variances of the portfolios aér. Compute the conditional 
covariance between these factors. l 


(BEKK representation of factor models) 
Consider the factor model 


7 
H, =+} Ajpp;b' Aji = @j +ajeh, 1 +bjàjis 
j=l 


where the Bj are linearly independent, œw; > 0, aj > 0 and 0 < bj < 1 for j =1,...,r. 
Show that a BEKK representation holds, of the form 


K K 
H, = OF +) Ages] + D> Be Hy-1 By. 
k=1 k=1 


(PCA of a covariance matrix) 
Let X be a random vector of R” with variance matrix È. 


1. Find the (or a) first principal component of X, that is a random variable C! = uX of 


maximal variance, where u'u = 1. Is C! unique? 


2. Find the second principal component, that is, a random variable C? = uX of maximal 
variance, where uhu = 1 and Cov(C!, C?) = 0. 


3. Find all the principal components. 
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11.13 


11.14 


11.15 


11.16 


11.17 
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(BEKK-GARCH models with a diagonal representation) 
Show that the matrices A“ and BO?) defined in (11.21) are diagonal when the matrices Aj, 
and Bj, are diagonal. 


(Determinant of a block companion matrix) 
If A and D are square matrices, with D invertible, we have 


A BY _ Li 
det ( CD ) = de(D)dera — BD Gj: 


Use this property to show that matrix B in Corollary 11.1 satisfies 
det(B — AImp) = (—1)"?det {A? In — AP7'B; —---— AB,_1 — Bp}. 


(Eigenvalues of a product of positive definite matrices) 
Let A and B denote symmetric positive definite matrices of the same size. Show that AB 
is diagonalizable and that its eigenvalues are positive. 


(Positive definiteness of a sum of positive semi-definite matrices) 
Consider two matrices of the same size, symmetric and positive semi-definite, of the form 


_f An Ap [Bu 0 
if Az and B= 0 E 
where Aj; and Bj; are also square matrices of the same size. Show that if A22 and Bi; are 
positive definite, then so is A+ B. 


(Positive definite matrix and almost surely positive definite matrix) 
Let A by a symmetric random matrix such that for all real vectors c Æ 0, 


c’Ac>0 almost surely. 


Show that this does not entail that A is almost surely positive definite. 
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Financial Applications 


In this chapter we discuss several financial applications of GARCH models. In connecting these 
models with those frequently used in mathematical finance, one is faced with the problem that the 
latter are generally written in continuous time. We start by studying the relation between GARCH 
and continuous-time processes. We present sufficient conditions for a sequence of stochastic dif- 
ference equations to converge in distribution to a stochastic differential equation as the length 
of the discrete time intervals between observations goes to zero. We then apply these results to 
GARCH(1, 1)-type models. The second part of this chapter is devoted to the pricing of deriva- 
tives. We introduce the notion of the stochastic discount factor and show how it can be used in 
the GARCH framework. The final part of the chapter is devoted to risk measurement. 


12.1 Relation between GARCH and Continuous-Time 
Models 


Continuous-time models are central to mathematical finance. Most theoretical results on derivative 
pricing rely on continuous-time processes, obtained as solutions of diffusion equations. However, 
discrete-time models are the most widely used in applications. The literature on discrete-time 
models and that on continuous-time models developed independently, but it is possible to establish 
connections between the two approaches. 


12.1.1 Some Properties of Stochastic Differential Equations 


This first section reviews basic material from diffusion processes, which will be known to many 
readers. On some probability space (Q, A, P), a d-dimensional process {W,; 0 < t < oo} is called 
standard Brownian motion if Wo = 0 a.s., for s < t, the increment W, — W, is independent of 
o{W,,; u < s} and is N (0, (t — s)Iy) distributed, where J is the d x d identity matrix. Brownian 
motion is a Gaussian process and admits a version with continuous paths. 

A stochastic differential equation (SDE) in R? is an equation of the form 


dX; = u(X)dt +a(X;,)dW,, O<t<w, Xo= x0, (12.1) 
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where xo € R”, u and o are measurable functions, defined on R? and respectively taking values 
in R? and M pxa, the space of p x d matrices. Here we only consider time-homogeneous SDEs, 
in which the functions u and ø do not depend on f. A process (X;)rejo,r] is a solution of this 
equation, and is called a diffusion process, if it satisfies 


t 


t 
X: =x+ f nxvas+ f o(Xs)d Ws. 
0 0 


Existence and uniqueness of a solution require additional conditions on the functions u and o. 
The simplest conditions require Lipschitz and sublinearity properties: 


Ilex) — KC) II + lo) — oy) || < Klx — yll, 
ex) I? + Ilo I? < KA + ixl’, 


where t € [0,+o00[, x, y e RP, and K is a positive constant. In these inequalities, ||- || denotes 
a norm on either R? or M, xq. These hypotheses also ensure the ‘nonexplosion’ of the solution 
on every time interval of the form [0, T] with T > 0 (see Karatzas and Schreve, 1988, Theorem 
5.2.9). They can be considerably weakened, in particular when p = d = 1. The term p(X;) is 
called the drift of the diffusion, and the term ø (X;) is called the volatility. They have the following 
interpretation: 


w(x) = lim T E(X — Xr | X; = x), (12.2) 
t> 

a(x)o(x) = lim tw! Var(X — X; | X: =x). (12.3) 
t> 


These relations can be generalized using the second-order differential operator defined, in the case 
p=d = l, by 


Indeed, for a class of twice continuously differentiable functions f, we have 
Lf (x) = lim THEG (Xm) | X; =x) — fW). 


Moreover, the following property holds: if @ is a twice continuously differentiable function with 
compact support, then the process 


Y, = $ (X+) — $ (X0) f rovas 
0 


is a martingale with respect to the filtration (F;), where F; is the o-field generated by {W;, 5 < t}. 
This result admits a reciprocal which provides a useful characterization of diffusions. Indeed, it 
can be shown that if, for a process (X;), the process (Y,) just defined is a F;-martingale, for a 
class of sufficiently smooth functions ¢, then (X+) is a diffusion and solves (12.1). 


Stationary Distribution 


In certain cases the solution of an SDE admits a stationary distribution, but in general this distribu- 
tion is not available in explicit form. Wong (1964) showed that, for model (12.1) in the univariate 
case (p = d = 1) with o(-) > 0, if there exists a function f that solves the equation 
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ld 
FORO) = 5 Af 0° @)}, (12.4) 
x 
and belongs to the Pearson family of distributions, that is, of the form 
fœ) =x", x>0, (12.5) 
where a < —1 and b < 0, then (12.1) admits a stationary solution with density f. 


Example 12.1 (Linear model) A linear SDE is an equation of the form 
dX, = (w + pX,)dt +0 Xd W,, (12.6) 


where w, u and o are constants. For any initial value xo, this equation admits a strictly positive 
solution if œ > 0, x9 > 0 and (w, xo) Æ (0, 0) (Exercise 12.1). If f is assumed to be of the form 
(12.5), solving (12.4) leads to 


-2 
a=-2(1- 5). b=, 
g 


Under the constraints 


2 
w>0, o #0, ¢ġ := 1- — >0, 
o 


we obtain the stationary density 


l (20) xo (220) -i i 
f= TH ae Xp we x $ X> U, 


where I denotes the gamma function. If this distribution is chosen for the initial distribution (that 
is, the law of Xo), then the process (X+) is stationary and its inverse follows a gamma distribution,! 


1 2w 
aat ~r (Se), (12.7) 


12.1.2 Convergence of Markov Chains to Diffusions 


Consider a Markov chain Z = ( ZO een with values in R, indexed by the time unit t > 0. We 
transform Z®® into a continuous-time process, (2 ent by means of the time interpolation 


Ze = Z® ithe <i < Get Ir 


Under conditions given in the next theorem, the process (Zz ) converges in distribution to a 
diffusion. Denote by ||- || the Euclidean norm on R°. 


! The T (a, b) density, for a, b > 0, is defined on Rt by 


b 


(db) 


et bol . 


f(x) = 
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Theorem 12.1 (Convergence of oe) to a diffusion) Suppose there exist continuous applica- 
tions u and o from R! to RI and M pxa respectively, such that for all r >0 and for some 6 > 0, 


lim sup [mE (zo = 2 122 =z) -uo =0, (12.8) 
lim sup |r Wat 20 ye Z Ze =) (12.9) 
Timo sup 7 2+8)/2 F (Zio - 2248 | ZO = z) < 00. (12.10) 
Then, if the equation 
dZ, = u(Z,dt +0(Z,)dW;, 0<t<00, Zo= zo, (12.11) 


admits a solution (Z;) which is unique in distribution, and if Z” converges in distribution to Zo, 
then the process (Z) converges in distribution to (Z,). 


Remark 12.1 Condition (12.10) ensures, in particular, that by applying the Markov inequality, 
for all € > 0, 


: -1 (t) (©) 
lim t PZ — Zir 


>e|Z® = z) =, 
As a consequence, the limiting process has continuous paths. 


Euler Discretization of a Diffusion 


Diffusion processes do not admit an exact discretization in general. An exception is the geometric 
Brownian motion, defined as a solution of the real SDE 


dX, = uXıdt +o0X,dW, (12.12) 


where u and o are constants. It can be shown that if the initial value xọ is strictly positive, then 


X, € (0, oo) for any t > 0. By Itd’s lemma,” we obtain 
g? 
dlog(X;) = (u — =) dt +oadwW, (12.13) 


and then, by integration of this equation between times kt and (k + 1)t, we get the discretized 
version of model (12.12), 


2 


o iid 
log Xk+1)r = log Xr + (« = T) TH VtOE KEI, (Ekr) ~ NO, 1). (12.14) 


For general diffusions, an explicit discretized model does not exist but a natural approximation, 


called the Euler discretization, is obtained by replacing the differential elements by increments. 
The Euler discretization of the SDE (12.1) is then given, for the time unit t, by 


2 For Y, = f(t, X;) where f : (t,x) € [0, T] x Rt f(t, x) € R is continuous, continuously differentiable 
with respect to the first component and twice continuously differentiable with respect to the second component, 
if (X,) satisfies dX; = udt + ord W, where u, and o; are adapted processes such that a.s. h lut|dt < co 


and i o7dt < 00, we have 
of of arf 2 
dY, = ap X,)dt + oy” X,)dX,+ sete X;)o; dt. 
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iid 
X ktit = Xer + tH (Xer) + VTO (Xereeeyr, (€xr) ~ N(O, 1). (12.15) 


The Euler discretization of a diffusion converges in distribution to this diffusion (Exercise 12.2). 


Convergence of GARCH-M Processes to Diffusions 


It is natural to assume that the return of a financial asset increases with risk. Economic agents who 
are risk-averse must receive compensation when they own risky assets. ARMA-GARCH type time 
series models do not take this requirement into account because the conditional mean and variance 
are modeled separately. A simple way to model the dependence between the average return and 
risk is to specify the conditional mean of the returns in the form 


Mr =§ + Ào, 


where € and A are parameters. By doing so we obtain, when of is specified as an ARCH, a 
particular case of the ARCH in mean (ARCH-M) model, introduced by Engle, Lilien and Robins 
(1987). The parameter à can be interpreted as the price of risk and can thus be assumed to be 
positive. Other specifications of the conditional mean are obviously possible. In this section we 
focus on a GARCH(1, 1)-M model of the form 


g0) = w+a(m-i)g(%-1), (12.16) 


| X; = Xit flor) + om, (n+) iid (0, 1), 
where w>0, f is a continuous function from Rt to R, g is a continuous one-to-one map 
from R* to itself and a is a positive function. The previous interpretation implies that f is 
increasing, but at this point it is not necessary to make this assumption. When g(x) = x” and 
a(x) = ax? + B with a > 0, £ > 0, we get the classical GARCH(1, 1) model. Asymmetric effects 
can be introduced, for instance by taking g(x) =x and a(x) =a,xt —a_x” +B, with xt = 
max(x, 0), x7 = min(x,0),a, > 0,œ— > 0,8 > 0. 

Observe that the constraint for the existence of a strictly stationary and nonanticipative solution 
(Y;), with Y; = X; — X;_1, is written as 


E log{a(n:)} < 0, 


by the techniques studied in Chapter 2. 
Now, in view of the Euler discretization (12.15), we introduce the sequence of models indexed 
by the time unit t, defined by 


Xe = XD ASOD) t+ Voorn, G) iid (0, 1) mae 
(6 í) í) (12.17) 
§ O41) Wr +a, (N ire J; 


for k >0, with initial values x =F of? = o) >Q and assuming E nE) < oo. The intro- 
duction of a delay k in the second equation is due to the fact that o(,41), belongs to the o -field 
generated by ngr and its past values. 

Noting that the pair Zy, = (X(K~1)r, 8(Okr)) defines a Markov chain, we obtain its limiting 
distribution by application of Theorem 12.1. We have, for z = (x, g(o)), 


CEOS — XG e (2 ge = 2 HP (12.18) 


TEROR y) — 8) | ZE pe =D HT Or +T {Ear (ny) 180). 
(12.19) 
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This latter quantity converges if 


lim t'a =o, limt” l Ea: (n) — 1} = —ô, (12.20) 


t= 


where w and ô are constants. 


Similarly, 
a Mare = XG | Z& pHa =e (12.21) 
tT Varle(os? n) — ee) zE ]= rV D 12.22 
&§ ie 8O, (k-11) = AST ar{ar (Ny, yg? (0), (12:22) 
which converges if and only if 
lim t~! Vara: (n0) = ¢, (12.23) 
where ¢ is a positive constant. Finally, 
t'Covi Xp, — Xi De 8 ups) E (on) | Za D: =z] 
= 1 Coving?. a (nkp jog (0), (12.24) 
converges if and only if 
lim t~"? Cov{ng; > ar (N )} = P, (12.25) 
t= 
where p is a constant such that p° < ¢. 
Under these conditions we thus have 
2 
lim t~ ariz AY Zs , =z]= A(0) = ( o pog(o) ). 
| Gone | IEN pogo) fg°(o) 


Moreover, we have 


1 ri = 4 J 
A(0) = B(o)B'(a), where B(o) a pgo) Vt — P80) ). 


We are now in a position to state our next result. 


Theorem 12.2 (Convergence of (X mae g(o,”)) to a diffusion) Under Conditions (12.20), 
(12.23) and (12.25), and if, for 6 >0, 


Timor Ct? Efa; (n) — 170" < 60, (12.26) 


the limiting process when t — 0, in the sense of Theorem 12.1, of the sequence of solutions of 
models (12.17) is the bivariate diffusion 


(12:27) 


dX, = f(o,)dt+o,dW} 

dg(o;) lo — 5g(o,)}dt + glo) (odW} + Ve = aw?) 
where (WF) and (W?) are independent Brownian motions, with initial values xo and oo. 
Proof. It suffices to verify the conditions of Theorem 12.1. It is immediate from (12.18), (12.19), 


(12.21), (12.22), (12.24) and the hypotheses on f and g, that, in (12.8) and (12.9), the limits are 
uniform on every ball of R2. Moreover, for T < To and ô < 2, we have 


2 2+ 2 
e (K - Xie "| 2 = e) =E (|s Forts 


< E (ILOT oP), 
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2+6 


which is bounded uniformly in ø on every compact. On the other hand, introducing the L**° norm 


and using the triangle and Hélder’s inequalities, 


T HE (e08) - o|” Z = :) 


246 
= E |o + fae) = 1) 


2+8 
= |oo g t/a nE) =1 rol 


+6 
= =i 
< G Pe, TT 1/2 | 


(r) 2 
a-Il 8@)) 


2+ô 
o) : 


Since the limit superior of this quantity, when t — 0, is bounded uniformly in o on every compact, 
we can conclude that condition (12.10) is satisfied. 

It remains to show that the SDE (12.27) admits a unique solution. Note that g(o;) satisfies a 
linear SDE given by 


1 
< [io +071 (Elar (ng?) — 1 Blac (nf?) — 120+) 7279 


dg(o,) = {w — dg(a,)}dt + VE g(a,)dW;, (12.28) 
where (w? ) is the Brownian motion w? = (oW! + +y -p 2W?)/JE. This equation admits a 


unique solution (Exercise 12.1) 


t 1 3 
g(o;) = Y, | 2(00) + of ys , where Y, = exp{—(6 + ¢/2)t + yt Wẹ} 
0 


Ss 


The function g being one-to-one, we deduce o; and the solution (X;), uniquely obtained as 


t t 
X= 20+ f fonds + f o,dW). 
0 0 


Remark 12.2 


1. It is interesting to note that the limiting diffusion involves two Brownian motions, whereas 
GARCH processes involve only one noise. This can be explained by the fact that, to obtain a 
Markov chain, it is necessary to consider the pair (Xk—1)r, g(Okr)). The Brownian motions 
involved in the equations of X, and g(o;) are independent if and only if ọ = 0. This is, 
for instance, the case when the function a; is even and the distribution of the iid process 
is symmetric. 


2. Equation (12.28) shows that g(o;) is the solution of a linear model of the form (12.6). From 
the study of this model, we know that under the constraints 


26 
a>0, ¢>0, VETEEN 


there exists a stationary distribution for g(o;). If the process is initialized with this distri- 


bution, then 
l or (= 1+ =) (12.29) 
8 (0) g’ ¿j l 
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Example 12.2 (GARCH(1, 1)) The volatility of the GARCH(1, 1) model is obtained for g(x) = 
x?, a(x) =ax* + B. Suppose for simplicity that the distribution of the process (ne) does not 
depend on t and admits moments of order 4(1 + ô), for ô > 0. Denote by ju, the rth-order moment 
of this process. Conditions (12.20) and (12.23) take the form 


limt o, =, limt(u4—Do2=¢, limt'(, +f, — 1) = —ô. 
0 t>0 tT 0 


TE 


A choice of parameters satisfying these constraints is, for instance, 


Condition (12.25) is then automatically satisfied with 


_ H3 
p=Vo7——, 


as well as (12.26). The limiting diffusion takes the form 


dX, = f(o,)dt+o,dW} 


do? = {w—50/}dt+,/ G0? (maw! + Jus 1- 13a?) 


and, if the law of ye is symmetric, 


_ 1 
ee = f(o)dt+o,dW, (12.30) 


do? = {w—5o07}dt + /t07dW?. 


Note that, with other choices of the rates of convergence of the parameters, we can obtain a limiting 
process involving only one Brownian motion but with a degenerate volatility equation, in the sense 
that it is an ordinary differential equation (Exercise 12.3). 


Example 12.3 (TGARCH(1, 1)) For g(x) = x, a(x) = a,xt —a_x~ + B, we have the volatil- 
ity of the threshold GARCH(1, 1) model. Under the assumptions of the previous example, let 
bre = Eny and ur- = E(—nọ )’. Conditions (12.20) and (12.23) take the form 


1 


lim to, =o, limt {a7 May + Ma — (Ory Hig + aru)? = G, 
iy To (te May +&r-Ui- + Br — 1) = —ô. 
These constraints are satisfied by taking, for instance, a,4 = ./Ta,, a; = ./tTa_ and 


p= OT, OL Mr +07 po — (Oypi4g tami) =o, Br =1— ary pig — Arhi — St. 


Condition (12.25) is then satisfied with p = «442+ — @_[2_, as well as condition (12.26). The 
limiting diffusion takes the form 


dX, = f(o)dt+o,dWw) 
do, = {w—60,}dt+o; (paw? + Ve = p?aW?). 
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In particular, if the law of ne is symmetric and if œ+ = a@;_, the correlation between the Brownian 


motions of the two equations vanishes and we get a limiting diffusion of the form 


dX, = f(o;)dt+o,dw; 
do, = {w—60;}dt + ./o;dW?. 


By applying the It6 lemma to this equation, we obtain 
do? = Hw — b0;}o,dt + 2,/602dW?, 


which shows that, even in the symmetric case, the limiting diffusion does not coincide with that 
obtained in (12.30) for the classical GARCH model. When the law of n is symmetric and 
Ær} Æ ær—, the asymmetry in the discrete-time model results in a correlation between the two 
Brownian motions of the limiting diffusion. 


12.2 Option Pricing 


Classical option pricing models rely on independent Gaussian returns processes. These assumptions 
are incompatible with the empirical properties of prices, as we saw, in particular, in the introductory 
chapter. It is thus natural to consider pricing models founded on more realistic, GARCH-type, or 
stochastic volatility, price dynamics. 

We start by briefly recalling the terminology and basic concepts related to the Black-Scholes 
model. Appropriate financial references are provided at the end of this chapter. 


12.2.1 Derivatives and Options 


The need to hedge against several types of risk gave rise to a number of financial assets called 
derivatives. A derivative (derivative security or contingent claim) is a financial asset whose payoff 
depends on the price process of an underlying asset: action, portfolio, stock index, currency, etc. 
The definition of this payoff is settled in a contract. 

There are two basic types of option. A call option (put option) or more simply a call (put) is 
a derivative giving to the holder the right, but not the obligation, to buy (sell) an agreed quantity 
of the underlying asset S, from the seller of the option on (or before) the expiration date T, for a 
specified price K, the strike price or exercise price. The seller (or ‘writer’) of a call is obliged to sell 
the underlying asset should the buyer so decide. The buyer pays a fee, called a premium, for this 
right. The most common options, since their introduction in 1973, are the European options, which 
can be exercised only at the option expiry date, and the American options, which can be exercised 
at any time during the life of the option. For a European call option, the buyer receives, at the expiry 
date, the amount max(S; — K,0) = (Sr — K)* since the option will not be exercised unless it is 
‘in the money’. Similarly, for a put, the payoff at time T is (K — Sr)*. Asset pricing involves 
determining the option price at time t. In what follows, we shall only consider European options. 


12.2.2 The Black-Scholes Approach 


Consider a market with two assets, an underlying asset and a risk-free asset. The Black and Scholes 
(1973) model assumes that the price of the underlying asset is driven by a geometric Brownian 
motion 


dS; = wS;dt + o S;dW;,, (12.31) 
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where u and o are constants and (W,) is a standard Brownian motion. The risk-free interest rate 
r is assumed to be constant. By It6’s lemma, we obtain 


2 
dlog(S,) = (x 7 =) dt + odW,, (12.32) 


showing that the logarithm of the price follows a generalized Brownian motion, with drift u — 
o?/2 and constant volatility. Integrating (12.32) between times ¢ — 1 and ¢ yields the discretized 
version 

2 iid 


S 
log ( i ) =u- oe,  & ŽNO, 1). (12.33) 
St-1 2 


The assumption of constant volatility is obviously unrealistic. However, this model allows for 
explicit formulas for option prices, or more generally for any derivative based on the underlying 
asset S, with payoff g(Sr) at the expiry date T. The price of this product at time ¢ is unique 
under certain regularity conditions? and is denoted by C(S, t) for simplicity. The set of conditions 
ensuring the uniqueness of the derivative price is referred to as the complete market hypothesis. In 
particular, these conditions imply the absence of arbitrage opportunities, that is, that there is no 
‘free lunch’. It can be shown’ that the derivative price is 


C(S, t) = eT E" [g(Sr) | Si], (12.34) 
where the expectation is computed under the probability z corresponding to the equation 
dS; =rS,dt +oS,dW;, (12.35) 


where (W,*) denotes a standard Brownian motion. The probability x is called the risk-neutral 
probability, because under x the expected return of the underlying asset is the risk-free interest 
rate r. It is important to distinguish this from the historic probability, that is, the law under which 
the data are generated (here defined by model (12.31)). Under the risk-neutral probability, the price 
process is still a geometric Brownian motion, with the same volatility o but with drift r. Note 
that the initial drift term, u, does not play a role in the pricing formula (12.34). Moreover, the 
actualized price X, = e™™ S, satisfies dX; = o0X,dW;*. This implies that the actualized price is 
a martingale for the risk-neutral probability: ert JE" Sr | S;] = S+. Note that this formula is 
obvious in view of (12.34), by considering the underlying asset as a product with payoff Sr at 
time T. 

The Black-Scholes formula is an explicit version of (12.34) when the derivative is a call, that 
is, when g(Sr) = (K — Sr)*, given by 


C(S, t) = S,®(x, to Jt) — e ™ Kọ (x+), (12.36) 
where © is the conditional distribution function (cdf) of the M (0, 1) distribution and 


_ log(S;/e""K) 1 
= bce zI VT. 


3 These conditions are the absence of transaction costs, the possibility of constructing a portfolio with any 
allocations (sign and size) of the two assets, the possibility of continuously adjusting the composition of the 
portfolio and the existence of a price for the derivative depending only on the present and past values of S;. 

4 The three classical methods for proving this formula are the method based on the binomial model (Cox, 
Ross and Rubinstein, 1979), the method based on the resolution of equations with partial derivatives and the 
method based on the martingale theory. 


t=T-t, x 
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In particular, it can be seen that if $, is large compared to K, we have ®(x; +0./T) © ®(x;) ~ 1 
and the call price is approximately given by $, —e~''K, that is, the current underlying price 
minus the actualized exercise price. The price of a put P(S,t) follows from the put—call parity 
relationship (Exercise 12.4): C(S, t) = P(S,t)+ S,-—e-"'K. 

A simple computation (Exercise 12.5) shows that the European call option price is an increasing 
function of S,, which is intuitive. The derivative of C(S, t) with respect to S,, called delta, is used in 
the construction of a riskless hedge, a portfolio obtained from the risk-free and risky assets allowing 
the seller of a call to cover the risk of a loss when the option is exercised. The construction of a 
riskless hedge is often referred to as delta hedging. 

The previous approach can be extended to other price processes, in particular if ($+) is solution 
of a SDE of the form 

dS, = w(t, S;)dt + a(t, S)d W, 


under regularity assumptions on u and o. When the geometric Brownian motion for $, is replaced 
by another dynamics, the complete market property is generally lost." 


12.2.3 Historic Volatility and Implied Volatilities 


Note that, from a statistical point of view, the sole unknown parameter in the Black—Scholes 
pricing formula (12.36) is the volatility of the underlying asset. Assuming that the prices follow a 
geometric Brownian motion, application of this formula thus requires estimating o. Any estimate 
of ø based on a history of prices So,...,$, is referred to as historic volatility. For geometric 
Brownian motion, the log-returns log(S;/S;—1) are, by (12.32), iid N (u — 07/2, 07) distributed 
variables. Several estimation methods for ø can be considered, such as the method of moments 
and the maximum likelihood method (Exercise 12.7). An estimator of C(S, t) is then obtained by 
replacing o by its estimate. 

Another approach involves using option prices. In practice, traders usually work with the 
so-called implied volatilities. These are the volatilities implied by option prices observed in the 
market. Consider a European call option whose price at time t is Č. If 5, denotes the price of the 
underlying asset at time ¢, an implied volatility o} is defined by solving the equation 


~ 8 = log(S;/e7™K) 1 
Cr = SOO +0) Vt) -—e KOM), x= ae — 3% 


This equation cannot be solved analytically and numerical procedures are called for. Note that the 
solution is unique because the call price is an increasing function of o (Exercise 12.8). 

If the assumptions of the Black-Scholes model, that is, the geometric Brownian motion, are 
satisfied, implied volatilities calculated from options with different characteristics but the same 
underlying asset should coincide with the theoretical volatility o. In practice, implied volatilities 
calculated with different strikes or expiration dates are very unstable, which is not surprising since 
we know that the geometric Brownian motion is a misspecified model. 


12.2.4 Option Pricing when the Underlying Process is a GARCH 


In discrete time, with time unit ô, the binomial model (in which, given $+, S++ can only take two 
values) allows us to define a unique risk-neutral probability, under which the actualized price is 


5 This is the case for the stochastic volatility model of Hull and White (1987), defined by 


dS; = oS,dt +0,S,dW, 
do? = po7dt+éa7dW;, 


where W,, W,* are independent Brownian motions. 
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a martingale. This model is used, in the Cox, Ross and Rubinstein (1979) approach, as an analog 
in discrete time of the geometric Brownian motion. Intuitively, the assumption of a complete 
market is satisfied (in the binomial model as well as in the Black—Scholes model) because the 
number of assets, two, coincides with the number of states of the world at each date. Apart from 
this simple situation, the complete market property is generally lost in discrete time. It follows 
that a multiplicity of probability measures may exist, under which the prices are martingales, and 
consequently, a multiplicity of pricing formulas such as (12.34). Roughly speaking, there is too 
much variability in prices between consecutive dates. 

To determine options prices in incomplete markets, additional assumptions can be made on 
the risk premium and/or the preferences of the agents. A modern alternative relies on the concept 
of stochastic discount factor, which allows pricing formulas in discrete time similar to those in 
continuous time to be obtained. 


Stochastic Discount Factor 


We start by considering a general setting. Suppose that we observe a vector process Z = (Z;) and 
let J, denote the information available at time ft, that is, the o-field generated by {Z,, s < t}. We 
are interested in the pricing of a derivative whose payoff is g = g(Zr) at time T. Suppose that 
there exists, at time t < T, a price C;(Z, g, T) for this asset. It can be shown that, under mild 
assumptions on the function g +> C;(Z, g, T), we have the representation 


C(g,t,T) = Elg(Zr)Mz7 | fr], where M,7>0, Mr € Ir. (12.37) 


The variable M, r is called the stochastic discount factor (SDF) for the period [t, T]. The SDF 
introduced in representation (12.37) is not unique and can be parameterized. The formula applies, 
in particular, to the zero-coupon bond of expiry date T, defined as the asset with payoff 1 at time 
T. Its price at ¢ is the conditional expectation of the SDF, 


Bt, T)=Cd,t, T) = EIM; T | I). 


It follows that (12.37) can be written as 


C(g,t,T) = Blt, T)E | g(Zr) Mir | I (12.38) 
= BG Ty" |" 


Forward Risk-Neutral Probability 


Observe that the ratio M, r/B(t, T) is positive and that its mean, conditional on Z, is 1. Conse- 
quently, a probability change removing this factor in formula (12.38) can be done.’ Denoting by 
Tı, r the new probability and by E”.7 the expectation under this probability, we obtain the pricing 
formula 


C(g,t,T) = Bit, T)E™" [g(Zr) | I]. (12.39) 


The probability law 7; r is called forward risk-neutral probability. Note the analogy between this 
formula and (12.34), the latter corresponding to a particular form of B(t, T). To make this formula 
operational, it remains to specify the SDF. 


ó In particular, linearity and positivity (see Hansen and Richard, 1987; Gouriéroux and Tiomo, 2007). 
7If X is a random variable with distribution Py and G is a function with real positive values such that 
E{G(X)} = 1, we have, interpreting G as the density of some measure P% with respect to Px, 


E{XG(X)} = f soeareco = farw =: E*(X). 
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Risk-Neutral Probability 


As mentioned earlier, the SDF is not unique in incomplete markets. A natural restriction, referred 
to as a temporal coherence restriction, is given by 


Mir = Mi r41Mi41,142--.Mr-1r- (12.40) 
On the other hand, the one-step SDFs are constrained by 
Bit, t +1) = EM t1 | 5], Se = EUS Mirae | L], (12.41) 


where S; € I; is the price of an underlying asset (or a vector of assets). We have 


T-1 T-1 
A Mi i+1 
C(e,t, T) = B(t,t+ 1)E Z B(i, 1 ———-_ |], |. 12.42 
(e,t, T) = Bt, t +1) [s » IT (ii + [I agit. ] (12.42) 


Noting that 


Tl T-2 
Mii+t Mii+ Mr-1,T 
E| |- i r| = E| [|e ( SES Ji 
T Baitt| ] i BG, i+ D (a we 1) ] 


i=t 


we can make a change of probability such that the SDF vanishes. Under a probability law z;‘;, 
called risk-neutral probability, we thus have 


T-1 
C(g,t,T) = Bit, t+ DE™r [azn I] B(i,i + 1) | J ; (12.43) 
i=t+1 
The risk-neutral probability satisfies the temporal coherence property: 7;*;,, is related to 7; 


through the factor Mr ,r+1/B(T, T + 1). Without (12.40), the risk-neutral forward probability 
does not satisfy this property. 


Pricing Formulas 


One approach to deriving pricing formulas is to specify, parametrically, the dynamics of Z, and 
of M, 1+1, taking (12.41) into account. 


Example 12.4 (Black-Scholes model) Consider model (12.33) with Z; = log (S;/S;-1) and 
suppose that B(t, t + 1) =e". A simple specification of the one-step SDF is given by the affine 
model M, ,4; = exp(a+bZ;,1), where a and b are constants. The constraints in (12.41) are 
written as 

e” = E expla + bZ,41), 1= Eexpfa+ (b+ 1)Z;41}, 


that is, in view of the N (u — 02/2, o°) distribution of Z,, 


O=a+r+b ae t+) ap 
= ee es Ze ae os z ` 


These equations provide a unique solution (a,b). We then obtain the risk-neutral probability 
Tr t41 = T through the characteristic function 


E” (et'Z1+1) SE (ere at z) =E C = e ©70?°/2)+u?0?/2 
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Because the latter is the characteristic function of the V(r — o? /2, o?) distribution, we retrieve the 
geometric Brownian motion model (12.35) for (S,). The Black-Scholes formula is thus obtained 
by specifying an affine exponential SDF with constant coefficients. 


Now consider a general GARCH-type model of the form 


Zi 


Er 


log (S;/S;-1) = Ht + ét, 
g (St/St-1) = Hr t P (12.44) 
Ott, (m) ~ N (0, 1), 


where jz; and o; belong to the o-field generated by the past of Z,, with o; > 0. Suppose that the 
o-fields generated by the past of €,, Z; and ņ, are the same, and denote this o-field by J;_. 
Suppose again that B(t, t+ 1) =e". Consider for the SDF an affine exponential specification 
with random coefficients, given by 


My 1+1 = expla: + bin), (12.45) 
where a, b; € I;. The constraints (12.41) are written as 


e™ = E expla, + b:m+1 | T), 


1 = E exp{a; + binii + Zi41 | L} = E expla + Hii + (bi + orga) eg | Le}, 


that is, after simplification, 

2 2 
b Or 
ar =-r— >, boy =F — H — 5 


(12.46) 


As before, these equations provide a unique solution (a+, b+). The risk-neutral probability 7; 4+1 is 
defined through the characteristic function 


ETH (e"m | I) = E (e"^+ My 141/B(t, t+ 1) | L) 


= E (er eee (bruo tert | L) 
2 
o 
2 Frl 
= exp Gaz + bio) +u e) 


2 2 
Or 241 
=e — . 
xp (: ( 7 ) +u 7 ) 


The last two equalities are obtained by taking into account the constraints on a; and b;. Thus, 
under the probability 7+ +1, the law of the process (Z;) is given by the model 


— pye 
| Be ag p (12.47) 
— = On, (nt) ~ NO, 1). 


The independence of the ņnž follows from the independence between 77, and J, (because 17, , 
has a fixed distribution conditional on J,) and from the fact that n* = o Cz; -r+ o /DELL. 
The model under the risk-neutral probability is then a GARCH-type model if the variable oa? isa 
measurable function of the past of e¥. This generally does not hold because the relation 


o2 
ef = =r + a + € (12.48) 


entails that the past of e¥ is included in the past of €,, but not the reverse. 
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If the relation (12.48) is invertible, in the sense that there exists a measurable function f such 
that e = f (ež, Eža ...), model (12.47) is of the GARCH type, but the volatility oR can take a 
very complicated form as a function of the ež_,. Specifically, if the volatility under the historic 


j 
probability is that of a classical GARCH(1, 1), we have 


3 2 
Or-1 * 2 
PA = 
o =wt+a[r 5 Mi-1 tE] + Bop. 


Finally, using z;,7, the forward risk-neutral probability for the expiry date T, the price of a 
derivative is given by the formula 


Ci(Z, g, T) = e 7% QE [g(Zr) | S] (12.49) 
or, under the historic probability, in view of (12.40), 


CZ, 8, T) = E[g8(Zr)M: t+1Mi+1,t+2 --- Mr-1,r | Si]. (12.50) 


It is important to note that, with the affine exponential specification of the SDF, the volatilities 
coincide with the two probability measures. This will not be the case for other SDFs (Exercise 
12.11). 


Example 12.5 (Constant coefficient SDF) We have seen that the Black-Scholes formula relies 
on (i) a Gaussian marginal distribution for the log-returns, and (ii) an affine exponential SDF with 
constant coefficients. In the framework of model (12.44), it is thus natural to look for a SDF of 
the same type, with a; and b; independent of t. It is immediate from (12.46) that it is necessary 


and sufficient to take ju, of the form : 


0; 
Mr = ution; 
2 
where u and à are constants. We thus obtain a model of GARCH in mean type, because the 
conditional mean is a function of the conditional variance. The volatility in the risk-neutral model 
is thus written as 


2 
of =wt+a{(r—p) — àon +y + Bory. 


If, moreover, r = jz then under the historic probability the model is expressed as 


log (S:/S:;-1) = r+dAo; — t+ e, 
a = om, (m) Ž NO, 1), (12.51) 
o? = w+ae?,+ Bo, w>0,«,ß = 0, 


and under the risk-neutral probability, 


log (S:/S1) = r—-Et+e, i 
g = ont, (nt) “ NO, 1), (12.52) 
of = @t+ale;-— Ao) + Bo? ,. 


Under the latter probability, the actualized price e~" S, is a martingale (Exercise 12.9). Note that 
in (12.52) the coefficient à appears in the conditional variance, but the risk has been neutralized 
(the conditional mean no longer depends on a7). This risk-neutral probability was obtained by 
Duan (1995) using a different approach based on the utility of the representative agent. 
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Note that o? =0 + ar { a(r — ee + B} . Under the strict stationarity constraint 
E log{a(a — ny)” + B} < 0, 


the variable of is a function of the past of nf and can be interpreted as the volatility of Z, under 
the risk-neutral probability. Under this probability, the model is not a classical GARCH(1, 1) 
unless A = 0, but in this case the risk-neutral probability coincides with the historic one, which 
is not the case in practice. 


Numerical Pricing of Option Prices 


Explicit computation of the expectation involved in (12.49) is not possible, but the expectation can 
be evaluated by simulation. Note that, under 7, r, Sr and S$, are linked by the formula 


7: T 
1 
sy = 8.0 | -0-3 5 hs + > ho. 


s=t+1 s=tt+l 


where h, = 2. At time t, suppose that an estimate ô = (A, Ô, a, B) of the coefficients of model 
(12.51) is available, obtained from observations S,,..., S; of the underlying asset. Simulated values 
se of Sr, and thus simulated values z9 of Zr, for i = 1,..., N, are obtained by simulating, at 
step i, T — t independent realizations vt? of the N (0, 1) and by setting 


T 7 
, 1 , ae 
Ge = S, exp fa —tyr- E J h® + y vaP) ; 


s=t+1 s=t+1 


where the A, s=t+1,...,T, are recursively computed from 
nO = 0+ favt, - 3)? + LHe, 


taking, for instance, as initial value h® = 6?, the volatility estimated from the initial GARCH 
model (this volatility being computed recursively, and the effect of the initialization being negligible 
for t large enough under stationarity assumptions). This choice can be justified by noting that for 
SDFs of the form (12.45), the volatilities coincide under the historic and risk-neutral probabilities. 
Finally, a simulation of the derivative price is obtained by taking 


N 
A T-n l i 
Ĉ(Z, g, T) =e TTY" ep). 


i=l 


The previous approach can obviously be extended to more general GARCH models, with larger 
orders and/or different volatility specifications. It can also be extended to other SDFs. 

Empirical studies show that, comparing the computed prices with actual prices observed on 
the market, GARCH option pricing provides much better results than the classical Black—Scholes 
approach (see, for instance, Sabbatini and Linton, 1998; Hardle and Hafner, 2000). 

To conclude this section, we observe that the formula providing the theoretical option prices 
can also be used to estimate the parameters of the underlying process, using observed options (see, 
for instance, Hsieh and Ritchken, 2005). 
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12.3 Value at Risk and Other Risk Measures 


Risk measurement is becoming more and more important in the financial risk management of 
banks and other institutions involved in financial markets. The need to quantify risk typically 
arises when a financial institution has to determine the amount of capital to hold as a protection 
against unexpected losses. In fact, risk measurement is concerned with all types of risks encountered 
in finance. Market risk, the best-known type of risk, is the risk of change in the value of a financial 
position. Credit risk, also a very important type of risk, is the risk of not receiving repayments 
on outstanding loans, as a result borrower default. Operational risk, which has received more and 
more attention in recent years, is the risk of losses resulting from failed internal processes, people 
and systems, or external events. Liquidity risk occurs when, due to a lack of marketability, an 
investment cannot be bought or sold quickly enough to prevent a loss. Model risk can be defined 
as the risk due to the use of a misspecified model for the risk measurement. 

The need for risk measurement has increased dramatically, in the last two decades, due to the 
introduction of new regulation procedures. In 1996 the Basel Committee on Banking Supervision (a 
committee established by the central bank governors in 1974) prescribed a so-called standardized 
model for market risk. At the same time the Committee allowed the larger financial institutions to 
develop their own internal model. The second Basel Accord (Basel II), initiated in 2001, considers 
operational risk as a new risk class and prescribes the use of finer approaches to assess the risk of 
credit portfolios. By using sophisticated approaches the banks may reduce the amount of regulatory 
capital (the capital required to support the risk), but in the event of frequent losses a larger amount 
may be imposed by the regulator. Parallel developments took place in the insurance sector, giving 
rise to the Solvency projects. 

A risk measure that is used for specifying capital requirements can be thought of as the amount 
of capital that must be added to a position to make its risk acceptable to regulators. Value at risk 
(VaR) is arguably the most widely used risk measure in financial institutions. In 1993, the business 
bank JP Morgan publicized its estimation method, RiskMetrics, for the VaR of a portfolio. VaR 
is now an indispensable tool for banks, regulators and portfolio managers. Hundreds of academic 
and nonacademic papers on VaR may be found at http: //www.gloriamundi.org/ We start by 
defining VaR and discussing its properties. 


12.3.1 Value at Risk 
Definition 


VaR is concerned with the possible loss of a portfolio in a given time horizon. A natural risk measure 
is the maximum possible loss. However, in most models, the support of the loss distribution is 
unbounded so that the maximum loss is infinite. The concept of VaR replaces the maximum loss 
by a maximum loss which is not exceeded with a given (high) probability. 

VaR should be computed using the predictive distribution of future losses, that is, the condi- 
tional distribution of future losses using the current information. However, for horizons h > 1, this 
conditional distribution may be hard to obtain. 

To be more specific, consider a portfolio whose value at time ft is a random variable denoted 
V,. At horizon h, the Joss is denoted 


Ly t+h = — (Vin = V;). 


The distribution of L; +a is called the loss distribution (conditional or not). This distribution is 
used to compute the regulatory capital which allows certain risks to be covered, but not all of 
them. In general, V, is specified as a function of d unobservable risk factors. 
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Example 12.6 Suppose, for instance, that the portfolio is composed of d stocks. Denote by S; 
the price of stock i at time t and by rj,4,:4n = log Si t+h — log Si, t the log-return. If a; is the number 


of stocks i in the portfolio, we have 
d 


V, = > ai Si. 


i=l 


Assuming that the composition of the portfolio remains fixed between the dates t and t +h, we 


have 
d 


Lith = — Xa; Si,e (e7bhtth =J}; 


i=] 


The distribution of V;+n conditional on the available information at time ¢ is called the profit 
and loss (P&L) distribution. 
The determination of reserves depends on 


e the portfolio, 
e the available information at time t and the horizon h, 


e a level æ € (0, 1) characterizing the acceptable risk.? 


Denote by R;,n(a@) the level of the reserves. Including these reserves, which are not subject to 
remuneration, the value of the portfolio at time t + h becomes V;+n + R;.n(a@). The capital used 
to support risk, the VaR, also includes the current portfolio value, 


VaR; n (œ) = V; + Rin (), 


and satisfies 
P, EV:n _ V, < — VaR; „a (&œ)] <Q, 


where P, is the probability conditional on the information available at time ¢t.!? VaR can thus be 
interpreted as the capital exposed to risk in the event of bankruptcy. Equivalently, 


P,[VaR; n (a) < Litn] <a, ie. P[Liitn < VaR; (@)] >l-a. (12.53) 


In probabilistic terms, VaR; „ (œ) is thus simply the (1 — w)-quantile of the conditional loss distri- 
bution. If, for instance, for a confidence level 99% and a horizon of 10 days, the VaR of a portfolio 
is €5000, this means that, if the composition of the portfolio does not change, there is a probability 
of 1% that the potential loss over 10 days will be larger than €5000. 


Definition 12.1 The (1 — a)-quantile of the conditional loss distribution is called the VaR at the 
level a: 
VaR; a (a) := inf{x € R | P[Lirtn < x] = 1 — a}, 


when this quantile is positive. By convention VaR; (œ) = 0 otherwise. 


In particular, it is obvious that VaR; „ (œ) is a decreasing function of a. 
From (12.53), computing a VaR simply reduces to determining a quantile of the conditional 
loss distribution. Figure 12.1 compares the VaR of three distributions, with the same variance but 


8 For market risk management, h is typically 1 or 10 days. For the regulator (concerned with credit or 
operational risk) h is 1 year. 

° 1 — æ is often referred to as the confidence level. Typical values of a are 5% or 3%. 

10 Tn the ‘standard’ approach, the conditional distribution is replaced by the unconditional. 
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Figure 12.1 VaR is the (1 — a)-quantile of the conditional loss distribution (left). The right- 
hand graph displays the VaR as a function of œ € [1%, 5%] for a Gaussian distribution M (solid 
line), a Student ¢ distribution with 3 degrees of freedom S (dashed line) and a double exponential 
distribution € (thin dotted line). The three laws are standardized so as to have unit variances. For 
a = 1% we have VaR(V) < VaR(S) < VaR(E), whereas for a = 5% we have VaR(S) < VaR(E) < 
VaR(N). 


different tail thicknesses. The thickest tail, proportional to 1/x*, is that of the Student ¢ distribution 
with 3 degrees of freedom, here denoted S; the thinnest tail, proportional to ent 2 is that of the 
Gaussian NV; and the double exponential E€ possesses a tail of intermediate size, proportional to 
e721, For some very small level a, the VaRs are ranked in the order suggested by the thickness 
of the tails: VaR(N) < VaR(E) < VaR(S). However, the right-hand graph of Figure 12.1 shows 
that this ranking does not hold for the standard levels a = 1% ora = 5% . 


VaR and Conditional Moments 


Let us introduce the first two moments of L;,;+, conditional on the information available at time t: 


2 
Mi +h = E;(Let+n)s Of th = Var; (Lt.t+n)- 


Suppose that 


Litth = Mith + Ot t4h Lh (12.54) 


where Ly is a random variable with cumulative distribution function F}. Denote by F,~ the quantile 
function of the variable Ly, defined as the generalized inverse of F}: 


Fý (a) = inf{x € R | Fy(x) = a}, O0<a<l. 


If Fp is continuous and strictly increasing, we simply have F% (a) = F,'(a), where F! is the 
ordinary inverse of Fp. In view of (12.53) and (12.54) it follows that 


VaR; (a) — a 
1 — aœ = P,[VaR; a (œ) > Mi t+h + riL] = Fh (Mea (@) = meth ) g 


Ot, t+h 


Consequently, 
VaR; a (œ) = Mi t+h + Ot, t+h F (1 — q). (12.55) 


VaR can thus be decomposed into an ‘expected loss’ m;,:+n, the conditional mean of the loss, and 
an ‘unexpected loss’ 0;.44,F* (1 — a), also called economic capital. 
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The apparent simplicity of formula (12.55) masks difficulties in (i) deriving the first conditional 
moments for a given model, and (ii) determining the law Fp, supposedly independent of t, of the 
standardized loss at horizon h. 

Consider the price of a portfolio, defined as a combination of the prices of d assets, p, = a’ P,, 
where a, P, € R¢. Introducing the price variations AP, = P, — P;-1, we have 


h 
Lith = —(Pr+n — Pr) = a'(Pr+h P) = a’ 5 APi4i- 
i=1 


The term structure of the VaR, that is, its evolution as a function of the horizon, can be analyzed 
in different cases. 


Example 12.7 (Independent and identically distributed price variations) If the AP,+; are iid 
N (m, £) distributed, the law of L; 44; is M(—a'mh, a' Zah). In view of (12.55), it follows that 


VaR, (a) = —a'mh + Va’ Za vho! (1 — a). (12.56) 


In particular, if m = 0, we have 
VaR; n (or) = Vh VaR; 1 (a). (12.57) 


The rule that one multiplies the VaR at horizon 1 by V/A to obtain the VaR at horizon h is often 
erroneously used when the prices variations are not iid, centered and Gaussian (Exercise 12.12). 


Example 12.8 (AR(1) price variations) Suppose now that 
AP; — m = A(AP;-; —m)+U;,  (U;) tid NO, £), 


where A is a matrix whose eigenvalues have a modulus strictly less than 1. The process (A P;) is 
then stationary with expectation m. It can be verified (Exercise 12.13) that 


VaR; n (Œ) = a’ yn + Va' Era! (1l — a), (12.58) 


where, letting A; = (J — AÌ (I — A)™!, 
h 


Hin =—mh— AAr(AP, =m), En =Y Anj E Ah jr 
j=l 


If A = 0, (12.58) reduces to (12.56). Apart from this case, the term multiplying ®7!(1 — œ) is not 
proportional to v/h. 


Example 12.9 (ARCH(1) price variations) For simplicity let d = 1, a = 1 and 


AP, = Jo +0 AP? U, w>0,a,>0, (U;) iid N@, 1). (12.59) 


The conditional law of L; +1 is M (0, œ + a AP?). Therefore, 


VaR; 1 (œ) = Vo +1 APPO! (1 — a). 


Here VaR computation at horizons larger than 1 is problematic. Indeed, the conditional distribution 
of L; t+n is no longer Gaussian when h > 1 (Exercise 12.14). 


It is often more convenient to work with the log-returns r; = Aj, log p;, assumed to be station- 
ary, than with the price variations. Letting g;(h, œ) be the -quantile of the conditional distribution 
of the future returns 741 +----+7;4n, we obtain (Exercise 12.15) 


VaR; n (oe) = {1 — e19) p, (12.60) 
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Lack of Subadditivity of VaR 


VaR is often criticized for not satisfying, for any distribution of the price variations, the 
‘subadditivity’ property. Subadditivity means that the VaR for two portfolios after they have been 
merged should be no greater than the sum of their VaRs before they were merged. However, this 
property does not hold: if L; and L2 are two loss variables, we do not necessarily have, in obvious 
notation, 


VaR 1 "2 (a) < VaR i (a) + VaR‘ (a), Wor, t, h. (12.61) 


Example 12.10 (Pareto distribution) Let L; and L2 be two independent variables, Pareto dis- 
tributed, with density f(x) = (2 + x)? 1,5 _1. The cdf of this distribution is F(x) = (1 — (2+ 
x)~!) 1, _1, whence the VaR at level æ is Var(w) = a~! — 2. It can be verified, for instance using 
Mathematica, that 


2  2log3+x) 
4+x (44+ x)? 


P[Li + L2 <xJ=1- , x>-—2. 


Thus 


a 2-a 
PlLi + L2 < 2VaR(@)| = 1 — æ — > log <l-a. 
a 
It follows that VaRz,+1,(@) > VaR, (œ) + VaRz, (œ) = 2VaRz, (œ), Va €]0, 1[. If, for instance, 
a = 0.01, we find VaR; (0.01) = 98 and, numerically, VaR; +z, (0.01) ~ 203.2. 


This lack of subadditivity can be interpreted as a nonconvexity with respect to the composition of 
the portfolio. It means that the risk of a portfolio, when measured by the VaR, may be larger than 
the sum of the risks of each of its components (even when these components are independent, 
except in the Gaussian case). Risk management with VaR thus does not encourage diversification. 


12.3.2 Other Risk Measures 


Even if VaR is the most widely used risk measure, the choice of an adequate risk measure is an 
open issue. As already seen, the convexity property, with respect to the portfolio composition, is not 
satisfied for VaR with some distributions of the loss variable. In what follows, we present several 
alternatives to VaR, together with a conceptualization of the ‘expected’ properties of risks measures. 


Volatility and Moments 


In the Markowitz (1952) portfolio theory, the variance is used as a risk measure. It might then seem 
natural, in a dynamic framework, to use the volatility as a risk measure. However, volatility does 
not take into account the signs of the differences from the conditional mean. More importantly, this 
measure does not satisfy some ‘coherency’ properties, as will be seen later (translation invariance, 
subadditivity). 


Expected Shortfall 


The expected shortfall (ES), or anticipated loss, is the standard risk measure used in insurance 
since Solvency II. This risk measure is closely related to VaR, but avoids certain of its conceptual 
difficulties (in particular, subadditivity). It is more sensitive then VaR to the shape of the conditional 
loss distribution in the tail of the distribution. In contrast to VaR, it is informative about the expected 
loss when a big loss occurs. 
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Let L;+4 be such that E Le n < ©. In this section, the conditional distribution of L;++4n is 
assumed to have a continuous and strictly increasing cdf. The ES at level œ, also referred to as 


Tailvar, is defined as the conditional expectation of the loss given that the loss exceeds the VaR: 


ES; n (œ) := Eilr th | Lth > VaR n(@)]. (12.62) 
We have 
Ey (Lith Daehn > VaR, (œ) | = Ey [Li rth | Lirtn > VaR n(œ)] P ULr +n > VaRin(@)]. 


Now P,[Li +n > VaR; a(&)] = 1 — Pi[Lrt+n < VaRin(a)] = 1 — (1 — œ) =a, where the last but 
one equality follows from the continuity of the cdf at VaR, „ (œ). Thus 


1 
ES; n (œ) = g Elnr il ae > VaR, (a) l- (12.63) 


The following characterization also holds (Exercise 12.16): 
1 a 
ES, (œ) = f VaR; n (u)du. (12.64) 
a Jo 


ES thus can be interpreted, for a given level œ, as the mean of the VaR over all levels u < a. 
Obviously, ES; (œ) > VaR; ,(@). 

Note that the integral representation makes ES; (œ) a continuous function of œ, whatever the 
distribution (continuous or not) of the loss variable. VaR does not satisfy this property (for loss 
variables which have a zero mass over certain intervals). 


Example 12.11 (The Gaussian case) If the conditional loss distribution is M (Mr t+h, a, +h) 
then, by (12.55), VaR, (œ) = Mi 4n + Or14n® 11 — æ), where ® is the MO, 1) cdf. Using 


(12.62) and introducing L*, a variable of law M (0, 1), we have 


ES; n (œ) = mit+h + Orth ELL* | L* > © 11 —@)] 


= Mt, t+h T Ot,t+h a li+>0-1(1—a)] 


1 
= mirth + Ort nzo —a)}, 


where ¢ is the density of the standard Gaussian. For instance, if œ = 0.05, the conditional standard 
deviation is multiplied by 1.65 in the VaR formula, and by 2.06 in the ES formula. 


More generally, we have under assumption (12.54), in view of (12.64) and (12.55), 
E hacen 
ES;.n(@) = Mi t+h + ons f F, (Q — u)du. (12.65) 
0 


Distortion Risk Measures 


Continue to assume that the cdf F of the loss distribution is continuous and strictly increasing. 
For notational simplicity, we omit the indices ¢ and h. From (12.64), the ES is written as 


1 
1 
ES(a) = 1 F (1 — u) ljo,a}(u)—=du, 
0 a 
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where the term Ijo,.} + can be interpreted as the density of the uniform distribution over [0, a]. 
More generally, a distortion risk measure (DRM) is defined as a number 


1 
r(F; G) z. F7! (1 — u)dG(u), 
0 


where G is a cdf on [0, 1], called distortion function, and F is the loss distribution. The introduction 
of a probability distribution on the confidence levels is often interpreted in terms of optimism or 
pessimism. If G admits a density g which is increasing on [0, 1], that is, if G is convex, the weight 
of the quantile F~'(1 — u) increases with u: large risks receive small weights with this choice of 
G. Conversely, if G decreases, those large risks receive the bigger weights. 

VaR at level a is a DRM, obtained by taking for G the Dirac mass at œ. As we have seen, the 
ES corresponds to the constant density g on [0, œ]: it is simply an average over all levels below a. 

A family of DRMs is obtained by parameterizing the distortion measure as 


1 
rps a= f F'(1—u)dG,p(u), 


where the parameter p reflects the confidence level, that is, the degree of optimism in the face of 
risk. 


Example 12.12 (Exponential DRM) Let 


G ( ) 1 <a eP" 
u) = ———_, 
r 1 — eP 
where p €]0, +o0[. We have i 
= pe?" 
F;G)= | F'(A- du. 
rp( ) [ ( u) PEE u 


The density function g is decreasing whatever the value of p, which corresponds to an excessive 
weighting of the larger risks. 

Coherent Risk Measures 

In response to criticisms of VaR, several notions of coherent risk measures have been introduced. 


One of the proposed definitions is the following. 


Definition 12.2 Let L denote a set of real random loss variables defined on a measurable space 
(Q, A). Suppose that L contains all the variables that are almost surely constant and is closed 
under addition and multiplication by scalars. An application p : L +> R is called a coherent risk 
measure if it has the following properties: 


1. Monotonicity: VL,, Lz € L, Ly < L > p(Lı) < p(L2). 
2. Subadditivity: VL,, Lə € L, Lı + L2 € L > p(Lı + L2) < p(L1) + p(L2). 


3. Positive homogeneity: YL € L, VA > 0, p(AL) = àp (L). 
4. Translation invariance: VL € L, Yc € R, p(L +c) = p(L)+ c. 


This definition has the following immediate consequences: 


1. p(0) = 0, using the homogeneity property with L = 0. More generally, p(c) = c for all 
constants c (if a loss of amount c is certain, a cash amount c should be added to the 
portfolio). 
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2. If L > 0, then p(L) > 0. If a loss is certain, an amount of capital must be added to the 
position. 


3. p(L — p(L)) = 0, that is, the deterministic amount p(L) cancels the risk of L. 


These requirements are not satisfied for most risk measures used in finance. The variance, or 
more generally any risk measure based on the centered moments of the loss distribution, does 
not satisfy the monotonicity property, for instance. The expectation can be seen as a coherent, but 
uninteresting, risk measure. VaR satisfies all conditions except subadditivity: we have seen that this 
property holds for (dependent or independent) Gaussian variables, but not for general variables. 
ES is a coherent risk measure in the sense of Definition 12.2 (Exercise 12.17). It can be shown 
(see Wang and Dhaene, 1998) that DRMs with G concave satisfy the subadditivity requirement. 


12.3.3 Estimation Methods 
Unconditional VaR 


The simplest estimation method is based on the K last returns at horizon A, that is, 7;4,-;(2) = 
log(pi4n-i/P1-i), for i = h...,h + K — 1. These K returns are viewed as scenarios for future 
returns. The nonparametric historical VaR is simply obtained by replacing q;(h, a) in (12.60) 
by the empirical a-quantile of the last K returns. Typical values are K = 250 and a= 1% , 
which means that the third worst return is used as the empirical quantile. A parametric version 
is obtained by fitting a particular distribution to the returns, for example a Gaussian N (u, 07) 
which amounts to replacing g;(h,a) by (t+ 6®7'(a), where fi and & are the estimated mean 
and standard deviation. Apart from the (somewhat unrealistic) case where the returns are iid, these 
methods have little theoretical justification. 


RiskMetrics Model 


A popular estimation method for the conditional VaR relies on the RiskMetrics model. This model 
is defined by the equations 


rt = log(p:/P:-1) = €t = Otr, (nr) iid NO, 1), 
o = iof, +(l—-Aje? 


f= 


(12.66) 


where à €]0, 1[ is a smoothing parameter, for which, according to RiskMetrics (Longerstaey, 1996), 
a reasonable choice is à = 0.94 for daily series. Thus, of is simply the prediction of e? obtained 
by simple exponential smoothing. This model can also be viewed as an IGARCH(1, 1) without 
intercept. It is worth noting that no nondegenerate solution (r)rez to (12.66) exists (Exercise 
12.18). Thus, (12.66) is not a realistic data generating process for any usual financial series. This 
model can, however, be used as a simple tool for VaR computation. From (12.60), we get 


-1 
VaR, (0) = {1 =e") p, = pio — a). 


Let Q, denote the information generated by €;, €;_1,..., €}. Choosing an arbitrary initial value to 
or. we obtain Cane € Q, and 
E(o2,, | ) = E(ao? 1—1)? 2) = E (o? 2) = 0? 
(74; | Q) = Eoi + ( ogi- | 21) = E Ofaa | &) = 9/4, 


for i > 2. It follows that Var(4) +--+ rrn | Q) = hof. Note however that the conditional 
distribution of r;+1 +--+ + rrn is not exactly M (0, ho?) (Exercise 12.19). Many practitioners, 
however, systematically use the erroneous formula 


VaR; n (œ) = Vh VaR; (1, œ). (12.67) 
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GARCH-Based Estimation 


Of course, one can use more sophisticated GARCH-type models, rather than the degenerate version 
of RiskMetrics. To estimate VaR,(1,@) it suffices to use (12.60) and to estimate g,(1,a@) by 
6:41 ÊT! (œ), where 6? is the conditional variance estimated by a GARCH-type model (for instance, 
an EGARCH or TGARCH to account for the leverage effect; see Chapter 10), and F is an estimate 
of the distribution of the normalized residuals. It is, however, important to note that, even for a 
simple Gaussian GARCH(1, 1), there is no explicit available formula for computing g;(h, œ) when 
h> 1. Apart from the case h = 1, simulations are required to evaluate this quantile (but, as can 
be seen from Exercise 12.19, this should also be the case with the RiskMetrics method). The 
following procedure may then be suggested: 


(a) Fit a model, for instance a SERCH, 1), on the observed returns r; = €, f= 1,...,n, 
and deduce the estimated volatility ô ? for t=1,....n7+1. 

(b) Simulate a large number N of scenarios for €)41,...,€n+ by iterating, independently for 
i = 1,..., N, the following three steps: 
(b1) simulate the values nË ETET a iid with the distribution Ê ; 
(b2) set o, e = 6n41 and ci a 


w z palho y © y © 
(b3) for k=2,...,h, set (0%) =0+4(6 1) Afoa) and ef = 
© 
y 


(c) Determine the empirical quantile of simulations ep i=l, N. 


t+h? 


The distribution Ê can be obtained parametrically or nonparametrically. A simple nonparametric 
method involves taking for Ê the empirical distribution of the standardized residuals r,/6;, which 
amounts to taking, in step (b1), a bootstrap sample of the standardized residuals. 


Assessment of the Estimated VaR (Backtesting) 


The Basel accords allow financial institutions to develop their own internal procedures to evaluate 
their techniques for risk measurement. The term ‘backtesting’ refers to procedures comparing, on 
a test (out-of-sample) period, the observed violations of the VaR (or any other risk measure), the 
latter being computed from a model estimated on an earlier period (in-sample). 

To fix ideas, define the variables corresponding to the violations of VaR (‘hit variables’) 


Tr41(@) = liz, 4, > vak,1(a)} - 


Ideally, we should have 


n 


1 n 1 oe 
— > Ti4i(@) > Q and — >D Var; 1 (œ) minimal, 
me t=1 t=1 


that is, a correct proportion of effective losses which violate the estimated VaRs, with a minimal 
average cost. 


Numerical Illustration 


Consider a portfolio constituted solely by the CAC 40 index, over the period from March 1, 1990 
to April 23, 2007. We use the first 2202 daily returns, corresponding to the period from March 2, 
1990 to December 30, 1998, to estimate the volatility using different methods. To fix ideas, suppose 
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that on December 30, 1998 the value of the portfolio was, in French Francs, the equivalent of €1 
million. For the second period, from January 4, 1999 to April 23, 2007 (2120 values), we estimated 
VaR at horizon h = 1 and level œ = 1% using four methods. 

The first method (historical) is based on the empirical quantiles of the last 250 returns. The 
second method is RiskMetrics. The initial value for or was chosen equal to the average of the 
squared last 250 returns of the period from March 2, 1990 to December 30, 1998, and we took 
à = 0.94. The third method (GARCH-M) relies on a GARCH(1, 1) model with Gaussian M (0, 1) 
innovations. With this method we set F-'(0.01) = —2.32635, the 1% quantile of the N(0, 1) 
distribution. The last method (GARCH-NP) estimates volatility using a GARCH(1, 1) model, and 
approximates Ê~! (0.01) by the empirical 1% quantile of the standardized residuals. For the last 
two methods, we estimated a GARCH(1, 1) on the first period, and kept this GARCH model for all 
VaR estimations of the second period. The estimated VaR and the effective losses were compared 
for the 2120 data of the second period. 

Table 12.1 and Figure 12.2 do not allow us to draw definitive conclusions, but the historical 
method appears to be outperformed by the NP-GARCH method. On this example, the only method 


Table 12.1 Comparison of the four VaR estimation methods for the CAC 40. On the 2120 
values, the VaR at the 1% level should only be violated 2120 x 1% = 21.2 times on average. 


Historic RiskMetrics GARCH-V GARCH-NP 
Average estimated VaR (€) 38 323 32 235 31950 35059 
No. of losses > VaR 29 37 37 21 


Figure 12.2 Effective losses of the CAC 40 (solid lines) and estimated VaRs (dotted lines) in 
thousands of euros for the historical method (top left), RiskMetrics (top right), GARCH-N (bottom 
left) and GARCH-NP (bottom right). 
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which adequately controls the level 1% is the NP-GARCH, which is not surprising since the 
empirical distribution of the standardized residuals is very far from Gaussian. 


12.4 Bibliographical Notes 


A detailed presentation of the financial concepts introduced in this chapter is provided in the books 
by Gouriéroux and Jasiak (2001) and Franke, Härdle and Hafner (2004). A classical reference on 
the stochastic calculus is the book by Karatzas and Shreve (1988). 

The relation between continuous-time processes and GARCH processes was established by 
Nelson (1990b) (see also Nelson, 1992; Nelson and Foster, 1994, 1995). The results obtained by 
Nelson rely on concepts presented in the monograph by Stroock and Varadhan (1979). A synthesis 
of these results is presented in Elie (1994). An application of these techniques to the TARCH 
model with contemporaneous asymmetry is developed in El Babsiri and Zakoïan (1990). 

When applied to high-frequency (intraday) data, diffusion processes obtained as GARCH limits 
when the time unit tends to zero are often found inadequate, in particular because they do not allow 
for daily periodicity. There is a vast literature on the so-called realized volatility, which is a daily 
measure of daily return variability. See Barndorff-Nielsen and Shephard (2002) and Andersen et 
al. (2003) for econometric approaches to realized volatility. In the latter paper, it is argued that 
‘standard volatility models used for forecasting at the daily level cannot readily accommodate 
the information in intraday data, and models specified directly for the intraday data generally fail 
to capture the longer interdaily volatility movements sufficiently well’. Another point of view is 
defended in the recent thesis by Visser (2009) in which it is shown that intraday price movements 
can be incorporated into daily GARCH models. 

Concerning the pricing of derivatives, we have purposely limited our presentation to the ele- 
mentary definitions. Specialized monographs on this topic are those of Dana and Jeanblanc-Picqué 
(1994) and Duffie (1994). Many continuous-time models have been proposed to extend the Black 
and Scholes (1973) formula to the case of a nonconstant volatility. The Hull and White (1987) 
approach introduces a stochastic differential equation for the volatility but is not compatible with 
the assumption of a complete market. To overcome this difficulty, Hobson and Rogers (1998) 
developed a stochastic volatility model in which no additional Brownian motion is introduced. A 
discrete-time version of this model was proposed and studied by Jeantheau (2004). 

The characterization of the risk-neutral measure in the GARCH case is due to Duan (1995). 
Numerical methods for computing option prices were developed by Engle and Mustafa (1992) and 
Heston and Nandi (2000), among many others. Problems of option hedging with pricing models 
based on GARCH or stochastic volatility are discussed in Garcia, Ghysels and Renault (1998). The 
empirical performance of pricing models in the GARCH framework is studied by Härdle and Hafner 
(2000), Christoffersen and Jacobs (2004) and the references therein. Valuation of American options 
in the GARCH framework is studied in Duan and Simonato (2001) and Stentoft (2005). The use 
of the realized volatility, based on high-frequency data is considered in Stentoft (2008). Statistical 
properties of the realized volatility in stochastic volatility models are studied by Barndorff-Nielsen 
and Shephard (2002). 

Introduced by Engle, Lilien and Robins (1987), ARCH-M models are characterized by a 
linear relationship between the conditional mean and variance of the returns. These models were 
used to test the validity of the intertemporal capital asset pricing model of Merton (1973) which 
postulates such a relationship (see, for instance, Lanne and Saikkonen, 2006). To our knowledge, 
the asymptotic properties of the QMLE have not been established for ARCH-M models. 

The concept of the stochastic discount factor was developed by Hansen and Richard (1987) 
and, more recently, by Cochrane (2001). Our presentation follows that of Gouriéroux and Tiomo 
(2007). This method is used in Bertholon, Monfort and Pegoraro (2008). 

The concept of coherent risk measures (Definition 12.2) was introduced by Artzner et al. 
(1999), initially on a finite probability space, and extended by Delbaen (2002). In the latter article 
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it is shown that, for the existence of coherent risk measures, the set £ cannot be too large, for 
instance the set of all absolutely continuous random variables. Alternative axioms were introduced 
by Wang, Young and Panjer (1997), initially for risk analysis in insurance. Dynamic VaR models 
were proposed by Koenker and Xiao (2006) (quantile autoregressive models), Engle and Manganelli 
(2004) (conditional autoregressive VaR), Gouriéroux and Jasiak (2008) (dynamic additive quantile). 
The issue of assessing risk measures was considered by Christoffersen (1998), Christoffersen and 
Pelletier (2004), Engle and Manganelli (2004) and Hurlin and Tokpavi (2006), among others. 
The article by Escanciano and Olmo (2010) considers the impact of parameter estimation in risk 
measure assessment. Evaluation of VaR at horizons longer than 1, under GARCH dynamics, is 
discussed by Ardia (2008). 


12.5 Exercises 


12.1 (Linear SDE) 
Consider the linear SDE (12.6). Letting x? denote the solution obtained for œ = 0, what is 
the equation satisfied by Y, = X;,/ x? 
Hint: the following result, which is a consequence of the multidimensional Itô formula, 
can be used. If X = (X1, X?) is a two-dimensional process such that, for a real Brownian 
motion (W;), 

uldt +o)dW; 

u?dt +0?dW, 


dx} 
dX? 


under standard assumptions, then 
d(X}X?) = Xld X? + X?dX} + olofdt. 


Deduce the solution of (12.6) and verify that if œ > 0, x9 > 0 and (w, xo) Æ (0, 0), then 
this solution will remain strictly positive. 


12.2 (Convergence of the Euler discretization) 
Show that the Euler discretization (12.15), with u and ø continuous, converges in distri- 
bution to the solution of the SDE (12.1), assuming that this equation admits a unique (in 
distribution) solution. 


12.3 (Another limiting process for the GARCH(1, 1) (Corradi, 2000)) 
Instead of the rates of convergence (12.30) for the parameters of a GARCH(1, 1), consider 
limt'o,=o, limt‘a,=0, Ys <1, limt'(a,+f,—1)=—6. 
t>0 t>0 t>0 
Give an example of the sequence (wz, ær, r) compatible with these conditions. Determine 
the limiting process of (X n ; CADA) when t — 0. Show that, in this model, the volatility 
a£ has a nonstochastic limit when t > oo. 


12.4 (Put—call parity) 
Using the martingale property for the actualized price under the risk-neutral probability, 
deduce the European put option price from the European call option price. 


12.5 (Delta of a European call) 
Compute the derivative with respect to S, of the European call option price and check that 
it is positive. 


12.6 


12.7 


12.8 


12.9 


12.10 


12.11 


12.12 


12.13 


12.14 
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(Volatility of an option price) 
Show that the European call option price C; = C(S, t) is solution of an SDE of the form 


dC; = WC, dt + 0,C,dW; 


with o >v. 


(Estimation of the drift and volatility) 
Compute the maximum likelihood estimators of u and o” based on observations S},..., Sn 
of the geometric Brownian motion. 


(Vega of a European call) 

A measure of the sensitivity of an option to the volatility of the underlying asset is the 
so-called vega coefficient defined by 9C;/ðo. Compute this coefficient for a European call 
and verify that it is positive. Is this intuitive? 


(Martingale property under the risk-neutral probability) 
Verify that under the measure z defined in (12.52), the actualized price e™™ S, is a martingale. 


(Risk-neutral probability for a nonlinear GARCH model) 
Duan (1995) considered the model 
log (S;/Si-1) = r+Ao; F Er, 
Er = Ott, 
of = w+ {a (N1 zi y)? + Blo? 


iid ; ; ; 
where w >Q, a, 6 > 0 and (n) XN (0, 1). Establish the strict and second-order stationar- 
ity conditions for the process (€+). Determine the risk-neutral probability using stochastic 
discount factors, chosen to be affine exponential with time-dependent coefficients. 


(A nonaffine exponential SDF) 
Consider an SDF of the form 


M; t41 = expla, + bini 4 cna). 


Show that, by an appropriate choice of the coefficients a;, b; and c;, with c; #0, a risk- 
neutral probability can be obtained for model (12.51). Derive the risk-neutral version of the 
model and verify that the volatility differs from that of the initial model. 


(An erroneous computation of the VaR at horizon h.) 

The aim of this exercise is to show that (12.57) may be wrong if the price variations are 
iid but non-Gaussian. Suppose that (a’ A P,) is iid, with a double exponential density with 
parameter A, given by f(x) = 0.5A exp{—A|x|}. Calculate VaR; ı (œ). What is the density of 
Li 1+2? Deduce the equation for VaR at horizon 2. Show, for instance for A = 0.1, that VaR 
is overevaluated if (12.57) is applied with a = 0.01, but is underevaluated with a = 0.05. 


(VaR for AR(1) prices variations.) 
Check that formula (12.58) is satisfied. 


(VaR for ARCH(\) prices variations.) 

Suppose that the price variations follow an ARCH(1) model (12.59). Show that the distri- 
bution of AP;+2 conditional on the information at time ¢ is not Gaussian if a; > 0. Deduce 
that VaR at horizon 2 is not easily computable. 
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12.15 


12.16 


12.17 


12.18 


12.19 
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(VaR and conditional quantile) 
Derive formula (12.60), giving the relationship between VaR and the returns conditional 
quantile. 


(Integral formula for the ES) 
Using the fact that L,,4;, and F —!(U) have the same distribution, where U denotes a 
variable uniformly distributed on [0, 1] and F the cdf of L;+4, derive formula (12.64). 


(Coherence of the ES) 
Prove that the ES is a coherent risk measure. 
Hint for proving subadditivity: For L; such that ELY < œ, i = 1, 2, 3, denote the value at 
risk at level a by VaR; (œ) and the expected shortfall by ES; (œ) = a`! E[L; TL; >VaR; (a) ]- 
For L3 = Lı + L2, compute œ{ES; (œ) + ES2 (œ) — ES3(œ)} using expectations and observe 
that 

(Lı — VaR} (@)) (Iz, >vaR; (œ) — lz3>VaR;@)) = 0. 


(RiskMetrics is a prediction method, not really a model) 
For any initial value og let (€;);>; be a sequence of random variables satisfying the Risk- 
Metrics model (12.66) for any t > 1. Show that e, —> 0 almost surely as t > oo. 


(At horizon h>1 the conditional distribution of the future returns is not Gaussian with 
RiskMetrics) 

Prove that in the RiskMetrics model, the conditional distribution of the returns at horizon 
2, 41 +1142, is not Gaussian. Conclude that formula (12.67) is incorrect. 


Part IV 
Appendices 


Appendix A 


Ergodicity, Martingales, Mixing 


A.1 Ergodicity 
A stationary sequence is said to be ergodic if it satisfies the strong law of large numbers. 


Definition A.1 (Ergodic stationary processes) A strictly stationary process (Z;);ez, real- 
valued, is said to be ergodic if and only if, for any Borel set B and any integer k, 


n 
nS Tp (Ze, Zeri +++) Zete) > PA(Zi,---, Zik) € B) 


t=1 


with probability 1.' 


General transformations of ergodic sequences remain ergodic. The proof of the following result 
can be found, for instance, in Billingsley (1995, Theorem 36.4). 


Theorem A.1 Jf(Z;)ez is an ergodic strictly stationary sequence and if (Y;)+ez is defined by 
Y, = f...,Zr-1, Zr, Ziti, +++); 


where f is a measurable function from R™ to R, then (Y;);ez is also an ergodic strictly stationary 
sequence. 


In particular, if (X;)+ez is the nonanticipative stationary solution of the AR(1) equation 
X,=aX1+m, |al<1, m iid (0,0), (A.1) 


then the theorem shows that (X;);ez, (X1-11)rez and (X? rez are also ergodic stationary 
sequences. 


1 The ergodicity concept is much more general, and can be extended to nonstationary sequences (see, for 
instance, Billingsley, 1995). 
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Theorem A.2 (The ergodic theorem for stationary sequences) Jf (Z;);ez is strictly stationary 
and ergodic, if f is measurable and if 


E|f(..., Zr-1, Zt, Zt41,.--)| < ©, 


then > 
nT! YO Pv Sete Ze Leg EP Cy Zizi Ze, Ziti a) as. 
t=1 


As an example, consider the least-squares estimator â, of the parameter a in (A.1). By definition 


Gy = arg min Q,(a), Qn(a)= Ya = aX). 
t=2 


From the first-order condition, we obtain 
=i n 
fades no! Y io X X11 
= —— eT, o. 
=] n 2 
no! yor Xia 


The ergodic theorem shows that the numerator tends almost surely to y (1) = Cov(X;, X;-1) = 
ay (0) and that the denominator tends to y (0). It follows that â, — a almost surely as n —> oo. 
Note that this result still holds true when the assumption that 7, is a strong white noise is replaced 
by the assumption that 7; is a semi-strong white noise, or even by the assumption that 7, is an 
ergodic and stationary weak white noise. 


A.2 Martingale Increments 


In a purely random fair game (for instance, A and B play ‘heads or tails’, A gives one dollar to B 
if the coin falls tails, and B gives one dollar to A if the coin falls heads), the winnings of a player 
constitute a martingale. 


Definition A.2 (Martingale) Let (Y;)ren be a sequence of real random variables and (F;);en 
be a sequence of a-fields. The sequence (Y;, F;)ten is said to be a martingale if and only if 


bi Fy CF igs 

2. Y, is F,;-measurable; 
3. E|Y,| < œ; 

4. EY lF) = Y.. 


When (Y;);cn is said to be a martingale, it is implicitly assumed that F, = o(Y,, u < t), that is, 
the o-field generated by the past and present values. 


Definition A.3 (Martingale difference) Let (n;);en be a sequence of real random variables and 
(Fi)ren be a sequence of o-fields. The sequence (ni, F;)ren is said to be a martingale difference 
(or a sequence of martingale increments) if and only if 


Le Fp CF igs 

2. n, is F,-measurable; 
3. E|n,| < co; 

4. E(n+ilFr) = 0. 
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Remark A.1 If (Y;, Fren is a martingale and no = Yo, n: = Y; — Y;-1, then (N, F;)ren is a 
martingale difference: E(7;41|F;) = E (Ym |F) — E (Y, |F) = 0. 


Remark A.2 If (1;,F;);en is a martingale difference and Y, = no +, +--- +n, then 
(Y;, Fi)ren is a martingale: E (Y; 41|F) = E, + myil F) = Y,. 


Remark A.3 In example (A.1), 


k 
X amio Mu, tk <u <t) 
i=0 keN 


is a martingale, and {7,0 (u, U <t)}ren, {t X1-1, 7 (Mu, U < t)},en are martingale differences. 


There exists a central limit theorem (CLT) for triangular sequences of martingale differences (see 
Billingsley, 1995, p. 476). 


Theorem A.3 (Lindeberg’s CLT) Assume that, for each n>0, (Nnk, Fnk)ken is a Sequence of 
square integrable martingale increments. Let oh = E (nl Fag-1)). If 


n 
bm o% > 06 in probability as n > ov, (A.2) 
k=l 


where og is a Strictly positive constant, and 


n 


~~ Ennr Mingle) > 0 as n > 00, (A.3) 
k=1 


id L 
for any positive real €, then X `g Tink > NO, og) 


Remark A.4 In numerous applications, nng and F,,, are only defined for 1 < k < n and can be 
displayed as a triangular array 
n11 


21 22 
N31 N32 N33 
Nnl Nn2 vki Nnn 


One can define n,, and Fag for all k > 0, with nao = 0, Fno = {S, Q} and nng = 0, Fnk = Fann 
for all k>n. In the theorem each row of the triangular array is assumed to be a martingale 
difference. 


Remark A.5 The previous theorem encompasses the usual CLT. Let Z,,...,Z, be an iid 
sequence with a finite variance. It suffices to take 


SEZ, 


Nnk = n and Fuk =0(Z1,..., Zk). 
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It is clear that (nx, Fnk)ken is a square integrable martingale difference. We have on =E ny = 
n—!Var(Zo). Consequently, the normalization condition (A.2) is satisfied. Moreover, 


2: 
3 En I {link |Ze} = a a ze) [Zk = EZ,,| dP 


k=1 


= |Z, — EZ,|?dP > 0 
{|Z;-EZ,|=Vne} 


because {|Z; — EZ,| > yne} | Ø and fo |Z, — EZ\|?dP < 00. The Lindeberg condition (A.3) 
is thus satisfied. The theorem entails the standard CLT: 


n 


XY mk = RL EZ,). 


k=l 
Remark A.6 In example (A.1), take 


nkXk-1 


Nnk = va 


The sequence (nk, Fnk)keN is a Square integrable martingale difference. We have Er = 
lo? X?_. The ergodic theorem entails (A.2) with of = ot/(1 — a?). We obtain 


D Enh {Innel=€} .3 T , Xa a 
k=l 


{Ink Xk- l2 vne} 


and Fnk =0(u, u < k). 


= | Ini Xol dP > 0 
{Im Xol> vne} 


because f|ņ1 Xo] > vne} | Ø and fo lnıXol dP < co. This shows (A.3). The Lindeberg CLT 
entails that 


n7! 5 NkXk-1 E NO, o*/( = aj): 


k=1 
It follows that 


nP Y p- NXk-1 


= n 2 
no k= Xk- 


> o? /( — a°)? 


£ NO, 1- a?), (A.4) 


n!’ (4, — a) = 


because n~! X% X2] 


Remark A.7 The previous result can be used to obtain an asymptotic confidence interval or for 
testing the coefficient a: 


i ân + 1.9677! (1 — a2)" *] is a confidence interval for a at the asymptotic 95% confidence 
level. 


2. The null assumption Hp : a = 0 is rejected at the asymptotic 5% level if |f,| > 1.96, where 


tn = /ndy/,/ 1 — â? is the t-statistic. 


2 We have used Slutsky’s lemma: if Y, £ Y and T, —> T in probability, then T, Y, = YT. 
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In the previous statistics, 1— â? is often replaced by 67/7(0), where 67 = 7 (X, -— 
a, X 1)? /(n — 1) and 7(0) = n`! Ji Xie Asymptotically, there is no difference: 


A2 n A 2 
n-16 = rat (Xt — âônrXr-1) 
p E n 2 
1 70) i=l Xi 
n 2 a2 n 2 a n 
rat Xp t â; dpa X71 — 2ân rer Xe Ai 
an n 2 
t=1 Xia 
n 2 
t=1 x; _ 72 
= n 2 n 
t=1 Xi 


However, it is preferable to use ĉ?/f (0), which is always positive, rather than 1 — â? because, in 
finite samples, one can have â? >1, 


The following corollary applies to GARCH processes, which are stationary and ergodic martingale 
differences. 


Corollary A.1 (Billingsley, 1961) Zf (1, Fi) is a stationary and ergodic sequence of square 
integrable martingale increments such that g? = Var(v;) 4 0, then 


nY v x N(0, o?) . 


t=1 


Proof. Let nng = ve//n and Fak = Fp. For all n, the sequence (nx, Fx), is a square inte- 
grable martingale difference. With the notation of Theorem A.3, we have o2, = E(n?,|Fx—1), and 
(no? )k = [E (vz|Fe_)} , 18 a Stationary and ergodic sequence. We thus have almost surely 


n 1 n 
yon P XO EORIFr-1) > E {Evel Fi-1)} = 07 > 0, 
k=1 k=1 


which shows the normalization condition (A.2). Moreover, 
n n 
D Ems limelze = yon f vidP = f v?dP > 0, 
k=l k=l {Ivxl> vne} {Ivi l> vne} 


using stationarity and Lebesgue’s theorem. This shows (A.3). The corollary is thus a consequence 
of Theorem A.3. 


A.3 Mixing 


Numerous probabilistic tools have been developed for measuring the dependence between variables. 
For a process, elementary measures of dependence are the autocovariances and autocorrelations. 
When there is no linear dependence between X, and X;4,, as is the case for a GARCH pro- 
cess, the autocorrelation is not the right tool, and more elaborate concepts are required. Mixing 
assumptions, introduced by Rosenblatt (1956), are used to convey different ideas of asymptotic 
independence between the past and future of a process. We present here two of the most popular 
mixing coefficients. 
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A.3.1 a-Mixing and 6-Mixing Coefficients 


The strong mixing (or a-mixing) coefficient between two o-fields A and £? is defined by 


a(A,B)= sup |P(AN B)—P(A)P(B)|. 
AcA, BEB 


It is clear that: 
(i) if A and B are independent then a(A, B) = 0; 
Gi) 0 < a(A, B) < 1/4;4 
Gii) a (A, A) = 1/4 provided that A contains an event of probability 1/2; 
(iv) a(A, A) > 0 provided that A is nontrivial;> 
(v) a(A’, B’) < a(A, B) provided that A’ C A and B’ C B. 
The strong mixing coefficients of a process X = (X,) are defined by 


ay (h) = supa {o (X,,u < t), o (X,,u>t+h)}. 
t 


If X is stationary, the term sup, can be omitted. In this case, we have 


ax(h) = sup [P(A N B) — P(A)P(B)| 
A,B 


= sup |Cov(f(..., X-1, Xo), g(Xn, Xai, ---))! (A.5) 
fg 


where the first supremum is taken on A € o (Xs, 5 <0) and B € o(Xs,5 > h) and the second is 
taken on the set of the measurable functions f and g such that |f| < 1, |g| < 1. X is said to be 
strongly mixing, or a-mixing, if ax(h) > 0 as h > ow. If ax(h) tends to zero at an exponential 
rate, then X is said to be geometrically strongly mixing. 

The -mixing coefficients of a stationary process X are defined by 


Px(k) =E sup [P(B| o(Xs, s < 0)) — P(B)| 


Beo(Xs,s>k) 

1 I J 
= 5 sup >» P(A; N Bj) — P(A))P(B,)I, (A.6) 

i=1 j= 
where in the last equality, the sup is taken among all the pairs of partitions {Aj,..., Ar} and 
{B,..., By} of Q such that A; € o(X;,s < 0) for all i and Bj € o(Xs,s > k) for all j. The 


process is said to be 6-mixing if limg_... By (k) = 0. We have 
ax(k) < Bx(k), 


2 Obviously we are working with a probability space (Q, Ao, P), and A and B are sub-o -fields of Apo. 
4 It suffices to note that 


[P(A N B) — P(A)P(B)| = |P(A N B) — P(A N B)P(B) — P(A N B)P(B)| 
= |P(A N B)P(BS) — P(A N B*)P(B)| 
< P(B)P(B) < 1/4. 


For the first inequality we use ||a| — |b|| < max{|a|, |b|}. Alternatively, one can note that P(A N B) — 
P(A)P(B) = Cov(14, 1g) and use the Cauchy—Schwarz inequality. 
5 That is, A contains an event of probability different from 0 or 1. 
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so that 6-mixing implies a-mixing. If Y = (Y;) is a process such that Y, = f(X;,..., X+_,) 
for a measurable function f and an integer r>0, then o (%, t<s)Co(X;, t<s) and 


o(Y;, t=>s) Co (X;_,, t > s). In view of point (v) above, this entails that 


ay(k)<ay(k—r) and fy(k) < Bxy(k—r) forallk>r. (A.7) 


Example A.1 Itis clear that a g-dependent process such that 
Xp © O(E, 1, «++ Gg) where the e, are independent, 
is strongly mixing, because 
ax (h)<a.(h-—q)=0, Wh>q. 
Example A.2 Consider the process defined by 
X; = Ycos(At), 
where A € (0, x) and Y is a nontrivial random variable. Note that when cos(At) 4 0, which occurs 
for an infinite number of t, we have o(X;) = o(Y). We thus have, for any t and any h, o(Xy, 


u<t)=o(X,,u>t+h)=o(Y), and ay(h) =a {o(Y),o(Y)}>0 by (iv), which shows that 
X is not strongly mixing. 


Example A.3 Let (u;) be an iid sequence uniformly distributed on {1,..., 9}. Let 
CO 
Sy 10u. 
i=0 
The sequence u+, u;—-1,... constitutes the decimals of X;; we can write X; = 0.u;u;-1.... The 


process X = (X;) is stationary and satisfies a strong AR(1) representation of the form 


1 1 1 1 
X= Tox! + To” = io + 5 +é 


where é, = (u;/10) — (1/2) is a strong white noise. The process X is not œ-mixing because 
o(X;) C o(X;4,) for all h > 0,° and by (iv) and (v), 


ay (h) > a {o(X;), 0(X;)} > 0, Wh >= 0. 


A.3.2 Covariance Inequality 


Let p, q and r be three positive numbers such that p~! + qg~! + r~! = 1. Davydov (1968) showed 
the covariance inequality 


ICov(X, ¥)| < Koll X|Ipll¥llq læ (o(X), oY)", (A.8) 


where ||X I5 = EX? and Ko is a universal constant. Davydov initially proposed Kg = 12. Rio 
(1993) obtained a sharper inequality, involving the quantile functions of X and Y. The latter 


6 This would not be true if we added 0 to the set of the possible values of u;. For instance, X, = 0.4999... = 
0.5000... would not tell us whether X;_; is equal to 1 = 0.999... or to 0. 
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inequality also shows that one can take Kg = 4 in (A.8). Note that (A.8) entails that the autoco- 
variance function of an a@-mixing stationary process (with enough moments) tends to zero. 


Example A.4 Consider the process X = (X,) defined by 
X, = Ycos(wt) + Z sin(at), 


where w € (0, 7), and Y and Z are iid (0, 1). Then X is Gaussian, centered and stationary, with 
autocovariance function yy (h) = cos(@h). Since yy(h) & 0 as |h| > ov, X is not mixing. 


From inequality (A.8), we obtain, for instance, the following results. 


Corollary A.2 Let X = (X;) be a centered process such that 


CO 
sup ||X;lo4y < 00, X laxm < œ, for some v >Q. 
é 
h=0 


We have 


Proof. Let K = Ko sup, ||X;||3,,,,. From (A.8), we obtain 
[E Xn X;,| = |Cov(X;,, Xn)| < K {ax (ltr — appe., 


We thus have 


2 = E 1 
= È, lEXaXal s EK} lox We = o (<). 


1<t) <th <n h=0 


Corollary A.3 Let X = (X;) be a centered process such that 


CO 
sup || X; ll442v < œ, X ax e < œ, for some v >Q. 
t 
=0 


We have 


+00 


sup sup > [Cov (X;Xr4ns Xr4eXr4e40)| < 00. 
t h,k>O 
ea l=—00 


Proof. Consider the case 0 < h < k. We have 


+00 
> [Cov (X: Xi+hn, Xt+eXt+k+e)| = di + d2 + d3 + d4, 


£=—00 


ERGODICITY, MARTINGALES, MIXING 


where 

+00 

dı = > [Cov (X: X40, Xt+eXt+k+0)l, 
f=k 

-k 

dy = > |Cov(X:Xr4n, Xite Xitk+e)l, 
l=- 
+h-1 

dz = 5 [Cov (Xi Xt+h, Xt+eXt+k+0)l » 
=0 


-1 


d4 = os [Cov (X;Xi+n, Xi+eXt+k+0)l - 
f=—k+1 


Inequality (A.8) shows that dı and dọ are uniformly bounded in ¢, h and k by 


[e.e] 


Ko sup IIX; lfr X fox (ey/er™. 
i t=0 


To handle d3, note that d3 < ds + dẹ, where 


+h-1 


d5 = D [Cov (X;Xr4e, Xr4nXr+K+0)1 5 
¢=0 


+h-1 
de = D> |EX:Xt4nEX: Xian — EX: Xr4eE Xiph Xiks] - 
t=0 


With the notation K = Ko sup, MXi ys we have 


+h-1 co 


Kosup IXillayo, >, fax(h -DC < KY laxy O™, 
t=0 t=0 


d5 


IA 


de 


IA 


+h-1 
sup [| Xrll5 (nex. + D> EX Xrsel 
f (=0 


IA 


K e e #4) i g v/(2+v) . 
{se {ay (£)} SY lax( )} | 


é=0 


For the last inequality, we used 
sup |X; 21E XXa] < Ko sup ||X; |12 sup [X+ l3 tex ME 
t t t 


< K{ax (h) e. 
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Since the sequence of the mixing coefficients is decreasing, it is easy to see that 
supe>o L{æx(£)} "C+ < oo (cf. Exercise 3.9). The term d3 is thus uniformly bounded in 


t, h and k. We treat d4 as d3 (see Exercise 3.10). 
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Corollary A.4 Under the conditions of Corollary A.3, we have 


1 
EX, =0(-). 
n 


Proof. Letti < t < t3 < t4. From (A.8), we obtain 


EX; Xn Xn Xy = Cov(X4, Xn Xn Xu) 
< KollXa lall Xn Xn Xy lla+wtax (lt — a Dp E 


< K{ax (| — Npe. 


We thus have 


x, 


4! 
Ex, 27, Yo EX Xn XqXuq 


lsti <t<13<t4<n 


4) a 1 
aes v/(2+v) __ = 
2k) {ay (h)} = o( ). 


n 
h=0 


In the last inequality, we use the fact that the number of indices 1 < ti < th < t3 < t4 < n such 
that t = tı +h is less than n?. O 


A.3.3 Central Limit Theorem 


Herrndorf (1984) showed the following central limit theorem. 


Theorem A.4 (CLT for -mixing processes) Let X = (X,) be a centered process such that 


CO 
sup || Xz ll2+v < 00, Stax (Ayer < ©, for some v >Q. 
f h=0 
2 : -1/2 yu" ; : 
If o~ = liM,- Var (n ei X;) exists and is not zero, then 


n12 S x, 5 NO, 0%. 


t=1 


Appendix B 


Autocorrelation and Partial 
Autocorrelation 


B.1 Partial Autocorrelation 


Definition 


The (theoretical) partial autocorrelation at lag h > 0, rx(h), of a second-order stationary process 
X = (X,) with nondegenerate linear innovations,! is the correlation between 


X> EL(X;|X;-1, XA 95 303 Xi—n+1) 


and 
Xt—h — EL(X;—-n|Xt-1, X}~2, tee Xt-h+1), 
where EL(Y|Y,,..., Yp) denotes the linear regression of a square integrable variable Y on variables 
Y,,..., Yk. Let 
rx(h) = Corr (X+, X+-n|X1-1, X1-2, ---, Xt-n4g) - (B.1) 
The number rx(h) can be interpreted as the residual correlation between X; and X;—y, after the 
linear influence of the intermediate variables X;—1, X;-2,..., Xr—-n+1 has been subtracted. Assume 
that (X,) is centered, and consider the linear regression of X; on X;-1,..., Xia: 
Xi = ani Xp-1 tes + ann Xt—-n + Unt, Un LXt-1,..., Xin. (B.2) 
We have 
EL(X;|X:-1, ..., Xt-n) = Gh iXt-1 +++ + annXt-h, (B.3) 
EL(X;-n-1|X1-1, -< +5 Xt-n) = an, 1Xt-h + +++ + ann Xt-1, (B.4) 
and 
rx(h) = a4nh,h- (B.5) 
' Thus the variance of €, := X; — EL(X;|X;-1,...) is not equal to zero. 
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Proof of (B.3) and (B.4). We obtain (B.3) from (B.2), using the linearity of EL(-|X;-1,..., 
Xı-n) and ay.) X;-1 + +++ + an nXt-nLun r. The vector of the coefficients of the theoretical linear 
regression of X;~n—1 on X;~1,..., X+—n is given by 


=1 


Xizi Xı-ı 
E : (Xi ae Keen) EX;-n-1 i (B.6) 
Xin Xin 
Since 
Xi-1 Xt-h 
E : (Xi-1 Xt-h ) =E : (Xin Xt—ı ) 
Xih Xı-ı 
and 
Xi-1 Xin 
EX;-h-1 : = EX, : ; 
Xi-n Xı-1 


this is also the vector of the coefficients of the linear regression of X; on X;~y,..., X;-1, which 
gives (B.4). 


Proof of (B.5). From (B.2) we obtain 


EL(X;|Xi-1, «++, Xt-ngi) = Gna Xt + <+- + An h-1Xt-h+1 
+ Gh.nE(Xi—n|X1-1, .-., Xt-n41)- 


Thus 
Xı — EL(X;|X;-1,-.-, Xt-n41) = an,n {Xin — EL(Xt-nr|Xi-1; «--, Xt-n41)} + uae. 


This equality being of the form Y = ann X +u with uLX, we obtain Cov(Y, X) = an,n Var(X), 
which gives 


7 Cov {X; — EL(Xi|Xt-1, - - -, Xt-n+1), Xt-hn — EL(Xt-r|Xt-1, .--, Xt-r41)} 
khh S Å= CC a I I IIe e a; 
Var {X;-n — EL(Xt-r|Xt-1, ---, Xt-n41)} 


To conclude, it suffices to note that, using the parity of yx(-) and (B.4), 
Var {X; — EL(X;|X,-1,..-, X:-n+1)} 
= Var {X; — an1, Xi-1 — ++ — Gna, n—1 Xn} 


= Var {Xin — an—1,1X1—ng — +++ — an—1n—1X1-1 


= Var {X;-n — EL(X;-n|X1-1,---, Xt-nev} - 
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Computation Algorithm 


From the autocorrelations pxy(1),..., ox(h), the partial autocorrelation ry(h) can be computed 
rapidly with the aid of Durbin’s algorithm: 


ai ı = px(l), (B.7) 
oxtk) — i} pxtk = Darii 
akk = px) — Doser Ox = Dani (B.8) 
1— ia) px(i)ak-ı,i 
Ok = Akia = prärien E= henak l: (B.9) 


Steps (B.8) and (B.9) are repeated for k = 2,...,h — 1, and then ry(h) = an,n is obtained by 
step (B.8); see Exercise 1.14. 
Proof of (B.9). In view of (B.2), 


k-1 
EL(Xi|Xi1, <- Xen) = J aki Xii + ae EL (Kr eX ry «os Xt) 


i=l 


Using (B.4), we thus have 


k=l k-1 k-1 
> ak-1,i Xi = > aki X 1-1 + ak, k a Ap 1,k-iX1-i, 
i=l i=l i=l 
which gives (B.9) (the variables X;—1, . . . , X;—~x41 are not almost surely linearly dependent because 


the innovations of (X;) are nondegenerate). 


Proof of (B.8). The vector of coefficients of the linear regression of X; on X;~1,..., Xr—n 
satisfies 
X11 ant Xt-1 
E : (Xi-1 vwe Xith ) : = EX; : ; (B.10) 
Xt-h anh Xt-h 


The last row of (B.10) yields 


h 


Yoanivh — i) = y (h). 


i=l 
Using (B.9), we thus have 


h=1 

ann = p(h) — È p(h idan, 
i=l 
h-1 

pth) — X p(h — i)(an—1,i — ah, hāh—1,h—i) 
i=l 

_ pth) = i= plh = iani 

1= 3) p(h— iani n-i 


which gives (B.8). 
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Behavior of the Empirical Partial Autocorrelation 


The empirical partial autocorrelation, 7(h), is obtained form the algorithm (B.7)—(B.9), replacing 
the theoretical autocorrelations px (k) by the empirical autocorrelations py (k), defined by 

n—h 
Px(h) = Px(—h) =n! Y XXn, 


t=1 


Px (h) 


5 (h) = l 
px(h) Px (0) 


for h =0,1,...,n — 1. When (X;) is not assumed to be centered, X; is replaced by X; — Xp. 
In view of (B.5), we know that, for an AR(p) process, we have ry(h) = 0, for all h > p. When 
the noise is strong, the asymptotic distribution of the 7(h), h > p, is quite simple. 


Theorem B.1 (Asymptotic distribution of the 7(/)s for a strong AR(p) model) Jf X is the 
nonanticipative stationary solution of the AR(p) model 


P P 
Xi- Xi =m, M iidO,07), 07 £0, 1- az #0 Viel <1, 


i=l i=l 


then F 
vnê(h) + N (0,1), Yh>p. 

Proof. Let aọ = (aj,...,a@ p:0,...,0) be the vector of coefficients of the AR(h) model, when 
h > p. Consider 

Xr tee Xash Xn 

Kad  «-. Aneh Xn-1 -1 

x=| . , Y= and â= {XX} XY 
Xo E Xi-h Xı 

the coefficient of the empirical regression of X; on X;-1,..., Xr—n (taking X, = 0 for t < 0). 


It can be shown that, as for a standard regression model, y/n (â — ao) £ N (0, £), where 


-1 


yx (0) yx) >>  yx(h-1) 
yx (1) yx(0) = vx (h — 2) 
E Žo? lim n! ex =o - x i ; 
yx(h-1) >- yx) yx (0) 


Since îy (h) is the last component of â (by (B.5)), we have 


nôh) S NO, Elh, h)), 


with 
yx (0) yx) >  yx(j-12 
2 A0, h — 1) yx(1) yx(0) == yxg=2) 
x(h, h) = of —__—_,,_ A(0, j) = . : 
A(O, h) : : 
yxG-D > vx) yx (0) 


Applying the relations 


h-1 =l 


yx) -J aiyxi)=0°, — x(k) — ayx(k-i)=0, 


i=l i=l 
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fork = 1,...,4— 1, we obtain 
yx (0) yx) = yx(h—2) 0 
yx (1) yx (0) ss) yx(h — 3) 0 
A(0, h) = : : : 
yx(h—2) yx(h—-1) -= yx (0) 0 
yx(h—1) yx(h—2) = wD) yx- WE aye) 


=o7A(0,h — 1). 


Thus &(h, h) = 1, which completes the proof. 


The result of Theorem B.1 is no longer valid without the assumption that the noise 7, is iid. 
We can, however, obtain the asymptotic behavior of the 7(/)s from that of the 6(h)s. Let 


Pm = (px (1), .--, px(m)), bm = (Px (1), ---, Px(m)), 


lm >= (rx(1),...,7x(m)), Pm = (fy (1), ...,?x(m)). 


Theorem B.2 (Distribution of the 7(/)s from that of the 6(1)s) When n — on, if 


Jn (Am = Pm) 5 N (0, Ejn) ? 


then 
p L 
Jn (Pm — Tm) + N (0, Ern) Le, = Jin Up Ji 


Pm“ m?’ 


where the elements of the Jacobian matrix Jm are defined by Jm(i, j) = Orx(i)/dpx(j) and are 
recursively obtained for k = 2,...,m by 


arx(1)/dpx(j) = ay) = 1m Q), 
. ta aed? 
arx(k)/Apx(j) = ag = D 
dy. 
k-1 
ne = px(k) — >> pxk — Darii, 


i=l 


’ 


k-1 
dy =1— X oxari, 
i=l 


k-1 
G) : + (fj) 
ne = lk (j) — ak-1,k-j — X ox&k -ijai 
i=l 
k-1 
(j) n O) 
a,’ = —ak-1,j — YS ex @ages i» 
i=l 
G) G) G) (j) 5 
ae = aii = A. Ak-1,k-i — Cra igi (= 1.00, k=—1, 


where a; ; =0 for j < Oor j >i. 
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Proof. It suffices to apply the delta method,? considering ry(h) as a differentiable function of 
px(1),..., px (A). 


It follows that for a GARCH, and more generally for a weak white noise, 6(h) and 7(h) have 
the same asymptotic distribution. 


Theorem B.3 (Asymptotic distribution of (h) and 6(h) for weak noises) Jf X is a weak 
white noise and i 
VNÔÊm = N (0, Ejn) ; 


then 


Jin S N (0, Ejn): 


Proof. Consider the calculation of the derivatives ae when px(h) = 0 for all h Æ 0. It is clear 
that ax; = 0 for all k and all i. We then have dy = 1, ng = 0 and ny’ = Iyj(j). We thus have 
ay, = lw (j), and then Jm = In. 


The following result is stronger because it shows that, for a white noise, 6(h) and 7(h) are 
asymptotically equivalent. 


Theorem B.4 (Equivalence between 7(h) and f(h) for a weak noise) If (X;) is a weak white 
noise satisfying the condition of Theorem B.3 and, for all fixed h, 


WING tie setae) = OPC), (B.11) 
where (ân-1,1; shes an—1,h-1), is the vector of estimated coefficients of the linear regression of X, 
on X;-1, ..., Xt-h41 (t =h,...,n), then 


p(h) — ĉ(h) = Opn"). 
Proof. The result is straightforward for h = 1. For h > 1, we have by (B.8), 
TE io bh = iân 
1! Dân, 


In view of the assumptions, 
bk) =op), (nit. -+s @n—1n—1) = 0p (1) 


and 
b(k)an-14 = Opn’), 


for i = 1,...,h — 1 and k = 1,..., h. Thus 


nl ân- (Ah — i) — Ah) 


n {ĝ(h) — °(h)} = 1" pa 
z i=l h-1,i 


= 0,(1). 


Under mild assumptions, the left-hand side of (B.11) tends in law to a nondegenerate normal, 
which entails (B.11). 


Tf J/n(Xn — u) 5 N (0, £), for X, in R”, and g : R” > R* is of class C! in a neighborhood of u, 
then /n{g(X,) — g(u)} = N (0, JEJ’), where J = {dg(x)/dx'} (w). 
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B.2 Generalized Bartlett Formula for Nonlinear Processes 


Let X1,..., Xn be observations of a centered second-order stationary process X = (X,+). The 
empirical autocovariances and autocorrelations are defined by 


n—h 


P 5 7 F yx (h) 
Px(h) = Px(-A) = — XXn xh) = fxh) = 2, (B.12) 
nis Yx (0) 
for h = 0,...,n — 1. The following theorem provides an expression for the asymptotic variance 


of these estimators. This expression, which will be called Bartlett’s formula, is relatively easy to 
compute. For the empirical autocorrelations of (strongly) linear processes, we obtain the standard 
Bartlett formula, involving only the theoretical autocorrelation function of the observed process. 
For nonlinear processes admitting a weakly linear representation, the generalized Bartlett formula 
involves the autocorrelation function of the observed process, the kurtosis of the linear innovation 
process and the autocorrelation function of its square. This formula is obtained under a symmetry 
assumption on the linear innovation process. 


Theorem B.5 (Bartlett’s formula for weakly linear processes) We assume that (X;)rez admits 


the Wold representation 
co co 
X= > Viei Yo Iyl < 00, 


i=—0o i=—0o 
where (€;);ez is a weak white noise such that Ee T= Ke (Ee?)? < 00, and 
EEn EnEn Ey = 0 when ti # b, ti £ b and ti) Æ t4. (B.13) 
With the notation pz = parame pe2 (h), we have 


jim nCov {¥x@), Px (j)} = Ke — 3)yx G)yx G) 


+ YO Oix +tj -i +yx(e- j-i) 
&=—0o 


+ (pe — 3)(Ke — Dx @ yx (j) 


+e 1) DO yE- iyt- j) + yE J} pall). (B.14) 


e] 


If 
[Vn (Pou — Yo,m) = N(0, Eom) asn > o, 


where the elements of Zp, are given by (B.14), then 
A £ 
~n (Êm a Pm) > N(0, Ejn) ? 
where the elements of Xê, are given by the generalized Bartlett formula 


lim nCov {px (i), Px (J)} = vij + vj, i>0, j>0, (B.15) 


n> 
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vj = >> px(€) [2ox px (ex (©) — 2oxWpx(€ + j) 


£=—0o 


—2px()oxE+i)+px+j—-)+exe-j-d], 


UF = (ke — 1) DD pe ® ox px px (0) — 2x (A) px px +i) 


l=—0o 


—2px(i)px(O)px(l + j) + px +i) {ox + j) + px — Ai]. 


Remark B.1 (Symmetry condition (B.13) for a GARCH process) In view of (7.24), we know 
that if (€,) is a GARCH process with a symmetric distribution for n; and if E ef < oo, then (B.13) 
is satisfied (see Exercise 7.12). 


Remark B.2 (The generalized formula contains the standard formula) The right-hand side 
of (B.14) is a sum of four terms. When the sequence (e?) is uncorrelated, the sum of the last 
terms terms is equal to 


—2(ke — Dyx(i)yx (j) + (ke — Dyx@ {yx (G) + yx(=j)} = 0. 


In this case, we retrieve the standard Bartlett formula (1.1). We also have v = 0, and we retrieve 
the standard Bartlett formula for the 6(h)s. l 


Example B.1 If we have a white noise X, = e; that satisfies the assumptions of the theorem, 
then we have the generalized Bartlett formula (B.15) with 


2(i) 
vii = 1 and y= re 


it yè (0)? 


vij = UF = 0, for i x J 
for i >Q. 


Proof. Using (B.13) and the notation Wi, .i,,i,,15 = Wi, Wis Wis Viz, We obtain 


EX,X,4iX14nXt+j+h = J Wit izia i4 E tay Ethi—izEt+h—izEt+j+h—i4 (B.16) 

i, ,12,13,14 

= À \ Vipitiisistj EEn Eni T ` Wi i thyin,iotht ji E €i Cipi- 
i1,13 i112 


2 2 4 
+ y Wii) +h+j,inigth—i E Eii 41-1, — 2EG ` Wi itii thi th+j- 


iji iy 


The last equality is obtained by summing over the i, i2,i3,i4 such that the indices of 
fei s Et+i—iz» Et+h—iz» Et+j4 a are pairwise equal, which corresponds to three sums, and then 
by subtracting twice the sum in which the indices of {er ijo Et+i—i2, Et+h—iz, + j+h ot are all 
equal (the latter sum being counted three times in the first three sums). We also have 


(B.17) 


yx (i) = > Wi, Win E€1—i, 4i-in = Ve (O) > Wi, Viti. 
iiz i 
By stationarity and the dominated convergence theorem, it is easily shown that 


[e6] 


lim nCov {Px (i), Px} = J. Cov {X; Xi, Xren Xij} 
n—oo 


h=- 
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The existence of this sum is guaranteed by the conditions 
YolWil< oo and =} leeh) < 00. 
i h 


In view of (B.16) and (B.17), this sum is equal to 


2 Wi ,itiis,is+j D (Eel i Eth = 2(0)) 


11,13 h 


a 6 
+ > Wi i thyig,in+h+ ji E Efi hii, 


hi, ,i2 
4 ine ea Ee. e EA ree l 
Vii h+j,i2,i2+h—i Eti t4i-in E Wii) tity thi tht j 
h,i,i2 h,iy 


= pe ha Pi Vitti D Wiz Wis+j | X Ye (0) 


iy i3 


2 2 
F D Wi Win Eei i, €4i-i > Wi, h Vis th+j—i 
h 


i1,i2 


2 2 
+0 Wi Wn Ee iii, 0, Va nj Wioth—i 
h 


iy ,i2 


fogs 


—2E FO Wa Wi DD thWiy +h 
i h 


using Fubini’s theorem. With the notation E Gi Sk: 2 (0), and using once again (B.17) and the 
relation y2 (0) = (Ke — 1)y2 (0), we obtain 


dim nCov {Px (i), Px G} = ve Ope y 7 O) yx (7x G) 
+ > Vi, Win Ee i rin ve Ovxda +j- i—i) 
1,19 
T 2 Vi Wn Eei Eri- | Ox (in =J ==) 
11,12 
— 2Eely-? (O)yx (yx (j) 
= {(ke — Dp — 2ke} vx yx (j) 


+7010) J Ya Vin (yxli +j -i—i + yxli = j-i- i) 


iļ,i2 


x [yali — i2 + i1) + ¥2)}. 


With the change of index £ = i2 — i), we finally obtain 


„Jim nCov {Px (i), Px} = [le — I) pea — 2ke } vx Myx (A) (B.18) 


+7270) > OE i -D+rx-j-Di{veE-O+ VO}, 


l=- 
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which can also be written in the form (B.14) (see Exercise 1.11). The vector (6x (i), Ox(j)) is a 
function of the triplet (Ŷx (0), Px (i), Yx(j)). The Jacobian of this differentiable transformation is 


yx 1 0 
yg)  yx0) 
J= . : 
_ yx) 0 Ls, 
y2 (0) yx (0) 


Let & be the matrix of asymptotic variance of the triplet, whose elements are given by (B.18). By 
the delta method, we obtain 


lim nCov {6(4), AG) = JZI'(, 2) 


= OG) 5 ie yx (i) X(1, 3) — AO yo, 1) 


vx (0) vx (0) vx(0) vx (0) 


yx@yxG) — ,y¥xOrvxG) | 


= Í (e: = Ipa = 2r {2 
re — Dogs — 2c} | my O) 


+20 Y —— 972 [v20 + ¥2(0)} 


ZL AO 
AO (x@+) + yx- Di {v20 +20} 
7 PDO rrt- D +r- D yal +720) 
tzot +j- +yx@- j-i} {vali -4+ 120)}| 


Simplifying and using the autocorrelations, the previous quantity is equal to 


D Pex @px ex — px Opx O {ox(l + j) + exe — j)} 
l=- 


—px(j)px (0) {px (l — i) + px —i)} + px {exl +j- i) + pex- j- i))] 


+(ke—1) X pa Rox DxO — oxox) {ox(€ + j) + px(e— j) 


l=- 
—px(j)px(l — i) {ox (D + px(0)} +ex(i — D {ox (4 + j) + px(—£— j)}]. 


Noting that 


Do px exe +f) = > oxox j), 
£ L 


we obtain 
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lim nCov {A(i), 6(j)} 
n—->oo 


[e0] 


> ex [2ox Dex Dex O — 2px px E+ j) 
t=- 


—2px(jex(E +i) + pxl+j—i)+px(-j-i] 


[e.e] 


+(ke—1) Y pa Ror DAO — 2px @pexOpx(e j) 
f=—0o 


— 2px (j)ex (O)px(€ +i) +px(l + i) {ox (E+ j) + px- s)}].- 
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Appendix C 


Solutions to the Exercises 


Chapter 1 


1.1 1. (a) We have the stationary solution X, = Diso 0.5! (q; + 1), with mean EX, = 2 and 
autocorrelations px (h) = 0.5!"!, 
(b) We have an ‘anticipative’ stationary solution 
1 i 
X,;=—-1— 5 2,05 Nt+i+1> 
i=0 
which is such that EX, = —1 and py(h) = 0.5!"!, 
(c) The stationary solution 
X, =2 +90.5 (m1 — 0.4m: 1-1) 
i=0 
is such that EX, = 2 with px(1) = 2/19 and px(h) = 0.5""!px(1) for h > 1. 
2. The compatible models are respectively ARMA(1, 2), MA(3) and ARMA(1, 1). 


3. The first noise is strong, and the second is weak because 
Cov {(mm—1)? 5 (m—1m-2)°} = En? nana — 140. 
Note that, by Jensen’s inequality, this correlation is positive. 


1.2 Without loss of generality, assume X; = Ka for t < 1 ort >n. We have 


n—1 


n 2 
1 = = 1 = 
ù pth) = a ù (Xt — Xn)(Xt4n — Xn) = ýa -xo} =0; 


h=—-n+1 h,t 


which gives 1 + DE pth) = 0, and the result follows. 
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1.3 


1.4 


1.5 


1.6 


1.7 
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Consider the degenerate sequence (X;,),;-9,1,.. defined, on a probability space (Q, A, P), 
by X;(@) = (-1)! for all œ € Q and all t > 0. With probability 1, the sequence {(—1)'} 
is the realization of the process (X+). This process is nonstationary because, for instance, 
EXo 4 EX. 
Let U be a random variable, uniformly distributed on {0,1}. We define the process 
(%)r=0,1,... by 
Y, (œ) = (SHTO 


for any w € Q and any t > 0. The process (Y;) is stationary. We have in particular EY, = 0 
and Cov (Y;, Yisn) = (—1)". With probability 1/2, the realization of the stationary process 
(Y,) will be the sequence {(—1)'} (and with probability 1/2, it will be {(—1)'*!}). 

This example leads us to think that it is virtually impossible to determine whether a 
process is stationary or not, from the observation of only one trajectory, even of infinite length. 
However, practitioners do not consider {(—1)‘} as a potential realization of the stationary 
process (Y,). It is more natural, and simpler, to suppose that {(—1)‘} is generated by the 
nonstationary process (X;). 


The sequence 0, 1, 0, 1,... is a realization of the process X; = 0.5(1 + (—1) A), where A is 
a random variable such that P[A = 1] = P[A 1] = 0.5. It can easily be seen that (X;) 
is strictly stationary. 

Let Q* = {w | Xo, = 1, X41 = 0, Vt}. If (X+) is ergodic and stationary, the empirical 
means n=! 7", 1x,,,;=1 and n~! $; lly,,=1 both converge to the same limit P[X, = 1] 
with probability 1, by the ergodic theorem. For all œ € Q* these means are respectively equal 
to 1 and 0. Thus P(Q*) = 0. The probability of such a trajectory is thus equal to zero for 
any ergodic and stationary process. 


We have Fe, = 0, Var e, = 1 and Cov(é,;, €;_,) = 0 when h Æ 0, thus (€;) is a weak white 
noise. We also have Cov(e?, e?) = Enni: aes he pW —-1=3*-] Æ 0, thus €, and 
€;-1 are not independent, which shows that (€+) is not a strong white noise. 

Assume h>0. Define the random variable p(h)=y(h)/7(O), where p(h)= 
n`! Y €:é:-n- It is easy to see that ynô(h) has the same asymptotic variance 


(and also the same asymptotic distribution) as ./nf(h). Using 7(0) —> 1, stationarity, and 
Lebesgue’s theorem, this asymptotic variance is equal to 


Var /ny(h) = n`! > Cov (€:€1—hs €s€s—n) 


t,s=1 


n—-1 


=i 
=n 5 (n — |€\)Cov (€1€1-r, E€14+€14+¢-h) 


ł=-n+1 
CO 
> 5 Cov (€1€1—h, €1+¢€1+£—h) 
l=- 
pe a seit if O<h<k 
I“1-h 1 ifh>k. 


This value can be arbitrarily larger than 1, which is the value of the asymptotic variance of 
the empirical autocorrelations of a strong white noise. 


It is clear that (ez) is a second-order stationary process. By construction, €; and €;—p are 
independent when h > k, thus y2 (h) := Cov(e?, eż) = 0 for all h > k. Moreover, y2 (h) = 
3k+1-h _ 1, for h =0,...,k. In view of Theorem 1.2, e? — | thus follows an MA (k) process. 
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In the case k = 1, we have 
=l1l+u,+bu;-1, 


where |b| < 1 and (u,) is a white noise of variance o7. The coefficients b and øo? are 
determined by 
y2(0)=8=o07(1+b*), y2(l)=2=bo’, 


which gives b = 2 — v3 and øo? = 2/b. 
1.8 Reasoning as in Exercise 1.6, the asymptotic variance is equal to 


=2 =j 
Pida pii Mien (pm) _| (enn) Foaki 
(Ee)? Nie Nion \ Mi 1 ifO0<h £k. 


Since E (OPED) > (E Wy, for k Æ h the asymptotic variance can be arbitrarily smaller than 
1, which corresponds to the asymptotic variance of the empirical autocorrelations of a strong 
white noise. 

1.9 1. We have 


n 


a‘ nk 


k=m 


n 
<)i latte > 0 
2 


k=m 


when n>m and m —> oo. The sequence {u;(n)}, defined by un = yar ak‘ nik is a 
Cauchy sequence in L”, and thus converges in quadratic mean. A priori, 


[e6] n 
Do la*m = lim ¢ $ la*m- 
k=0 k=0 


exists in R U +{oo}. Using Beppo Levi’s theorem, 

CO CO 

EY ja*n] = (Elm) $ la*| < 00, 

k=0 k=0 
which shows that the limit 0 la‘ n,x| is finite almost surely. Thus, as n — œ, u(n) 
converges, both almost surely and in quadratic mean, to u; = > zco a*n. Since 

u;(n) =au;-\(n—1)+m, Vn, 
we obtain, taking the limit as n — oo of both sides of the equality, u; = au;—, + nr. This 
shows that (X;) = (u+) is a stationary solution of the AR(1) equation. 


Finally, assume the existence of two stationary solutions to the equation X, = aX;_-; + 
Nt and Ut = AUz-1 + Nt. If Uto F Xr» then 


0< |Ury 5 Xol =|a"| Jun n—X-n|, Wn, 
which entails 
lim sup |un-n| = +00 or  limsup|Xj—n| = +00. 
n= 00o n=>00 


This is in contradiction to the assumption that the two sequences are stationary, which 
shows the uniqueness of the stationary solution. 
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. We have X, =m tam—1+-:-+akm_~ + att! X;_4_1. Since |a| = 1, 


Var (X; — a*t! X41) = (k + Do* > 00 
as k — oo. If (X;) were stationary, 


Var (X, — at! X,_4-1) = 2 {VarX, + Cov (X;, X:-x-1)}, 


and we would have 
lim |Cov (X;, X;~x-1)| = ©. 
k= 


This is impossible, because by the Cauchy—Schwarz inequality, 


[Cov (X+, X+-x-1)| < VarX;. 


. The argument used in part 1 shows that 


n CO 

. — 1 

vu(n) = — J gtk > w= J gek 
k=1 k=1 


almost surely and in quadratic mean. Since 
u,(n) = avi (n + 1) + 
for all n, (v+) is a stationary solution (which is called anticipative, because it is a function 


of the future values of the noise) of the AR(1) equation. The uniqueness of the stationary 
solution is shown as in part 1. 


. The autocovariance function of the stationary solution is 


o9. 2 


1 o 1 
yYO=0 aaa, yh)=syh-1) h>0. 
k=1 


We thus have Ee, = 0 and, for all h > 0, 


1 1 1 
Cov(&, ern) = y (h) — —y (h — 1) — ~y(h+ 1) + —y (h) = 0, 
a a a 


which confirms that €, is a white noise. 


1.10 In Figure 1.6(a) we note that several empirical autocorrelations are outside the 95% signif- 


icance band, which leads us to think that the series may not be the realization of a strong 


white noise. Inspection of Figure 1.6(b) confirms that the observed series €),..., €, cannot 
be generated by a strong white noise, otherwise the series e, deine e would also be uncorre- 


lated. Clearly, this is not the case, because several empirical autocorrelations go far beyond 
the significance band. By contrast, it is plausible that the series is a weak noise. We know that 
Bartlett’s formula giving the limits +1.96/,/n is not valid for a weak noise (see Exercises 
1.6 and 1.8). On the other hand, we know that the square of a weak noise can be correlated 
(see Exercise 1.7). 
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1.11 Using the relation y,2(0) = (ne — 1)yž (0), equation (B.18) can be written as 


jim. nCov {Px (i), Px} = {Oe — Depe — 2ne} vx Wx G) 


+O DD OED yE- j- i) frel- O + yO} 


ł=—00 


= (p2 — 3) (ne — Dyx yx) + (Ne — 3JyxÐyx (J) 


+ D0 vx@{vx@+i-D)+rx-j-d} 


L=—0o 
CO 
+(e 1) DS OEH- i+ yE- j- iD peli o). 
£=—0o 
With the change of index h = i — £, we obtain 


oo 


E KOE- +e- j-i peli- e 


£=—00 


= Do vx(-h+i){yx(-h + j) + yx(-h = j)} path), 


h=—0oo 


which gives (B.14), using the parity of the autocovariance functions. 


1.12 We can assume i > 0 and j > 0. Since yy(£) = ye(£) = 0 for all £ 40, formula (B.18) 
yields 


lim nCov (Px), Px) = y'O tx G -D + rx —DH{reO +O} 
for (i, j) # (0, 0) and 


jim nCov {Px (0), PxO)} = {Me — Dee — 2ne} v2) + 2 {yal + ve}. 


Thus 
0 ifix~j 
lim nCov {Px (i), Px(J} = 4 Eee, ifi=j#0 
E y2(0)pa — 2Eef +2Ee?e}; ifi=j=0. 
In formula (B.15), we have v;; = 0 when i 4 j and v;; = 1. We also have UF = 0 when 


i A j and vý = (ne — 1)p.2(i) for all i A 0. Since (ne — 1) = ¥2 (0)y-7(0), we obtain 


0 fij 
lim ACov {Ax G=] ee, a 
~ PX) Beles Gti = j 20, 


For significance intervals Cp of asymptotic level 1 — a, such that limp. P[6(h) € Ch] = 


1 — œ, we have 
m 


M= > liec, - 
h=1 
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By definition of Cy, 


E(M) =) > PlA(h) € Ch] > m(L—a), asn > ov. 
h=1 


Moreover, 


m 


Var(M) = )) Var(amyec,) + D> Cov(lpayec,, Iaannec,) 
h=1 h£h’ 


= D5 P(A(h) € Ca){1 — P(A) € C)} 


h=1 
F Y{PA) € Ch, Êh’) € Cy) — P(6(h) € Ca) P(A(h’) € Cn’)} 
hÆ#h' 


>ma(l—a@a), asn —> œ. 


We have used the convergence in law of (6(h), 6(h’)) to a vector of independent variables. 
When the observed process is not a noise, this asymptotic independence does not hold in 
general. 


The probability that all the empirical autocorrelations stay within the asymptotic signifi- 
cance intervals (with the notation of the solution to Exercise 1.12) is, by the asymptotic 
independence, 


P[6() €Cy,...,A(m) E Cn] > (A — a)”, asn — oœ. 


For m = 20 and a = 5%, this limit is equal to 0.36. The probability of not rejecting the right 
model is thus low. 


In view of (B.7) we have ry (1) = px(1). Using step (B.8) with k = 2 and a1, ı = px(1), we 
obtain 
px (2) — px (1) 


rx) =.= 1 — p2ily 
x 


Then, step (B.9) yields 


a2,1 = px (l) — a2,2px(1) 
px (2) — px (1) 


= 1 — 


_ 1 — px(2) 
px(1) = px) — ay Ze 


Finally, step (B.8) yields 
px (3) — px (2)a2,1 — px (1)a2 
rx 3) = ——_ 
1 — px (1)a2,1 — px (2)a2,2 


_ ox@) {1 — p;(D} — px Dex {1 — ex(2)} — px) {ex @) — 03D} 
1 — px (1) — p D {1 — px 2)} — px (2) {ox (2) — e} 


SP500 Prices 
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SP 500 from 1/3/50 to 7/24/09 SP500 Returns 
Q oO 
(= -J 
10 o 
a T 
2 E8] 
(am = 
22 
oO 
N 4 
Sx T T T T = T T T T 
0 5000 10000 15000 0 5000 10000 15000 
Autocorrelations of the returns ACF of the squared returns 


Figure C.1 Closing prices and returns of the S&P 500 index from January 3, 1950 to July 24, 


2009. 


1.15 The historical data from January 3, 1950 to July 24, 2009 can be downloaded via the URL 
http://fr.finance.yahoo.com/q/hp?s = %5EGSPC. We obtain Figure C.1 with the 
following R code: 


# reading the SP500 data set 
sp500data <- read.table("sp500.csv",header=TRUE, sep=",") 
sp500<-rev(sp500data$Close) # closing price 
n<-length(sp500) 
rend<-log(sp500[2:n]/sp500[1: (n-1)]); rend2<-rend*2 
op <- par(mfrow = c(2, 2)) # 2 x 2 figures per page 
plot (ts(sp500),main="SP 500 from 1/3/50 to 7/24/09", 

ylab="SP500 Prices",xlab="") 
plot (ts(rend) ,main="SP500 Returns",ylab="SP500 Returns",xlab="") 
acf(rend, main="Autocorrelations of the returns",xlab="",ylim=c(-0.05,0.2)) 
acf(rend2, main="ACF of the squared returns",xlab="",ylim=c(-0.05,0.2)) 
par (op) 


Chapter 2 


2.1 This covariance is meaningful only if E Er < œ and Ef 2 (ern) < oo. Under these assump- 
tions, the equality is true and follows from E (e, | €u, u < t) = 0. 
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2.2 In case (i) the strict stationarity condition becomes œ + 6 < 1. In case (ii) elementary integral 


2.3 


2.4 


computations show that the condition is 


| B [3a 
2,/ — arctan ,/ — + log(3a + £) < 2. 
3a B 


Let A,,..., Am be the eigenvalues of A. If A is diagonalizable, there exist an invertible matrix 
P and a diagonal matrix D such that A= P~!DP. It follows that, taking a multiplicative 
norm, 


log || A‘|| = log || P71 DD’ P| < log ||P" ILD INPI 
= log || P7! || + log || D' || + log || PI. 


For the multiplicative norm ||A|| = X |a;;| we have log || D‘|| = log )>""_, Aj. The result fol- 
lows immediately. 

When A is any square matrix, the Jordan representation can be used. Let n; be the 
multiplicity of the eigenvalue 4;. We have the Jordan canonical form A = P~!JP, where P 
is invertible and J is the block-diagonal matrix with a diagonal of m matrices J; (A;), of size 
ni X ni, with A; on the diagonal, 1 on the superdiagonal, and O elsewhere. It follows that 
At = P~'J'P where J‘ is the block-diagonal matrix whose blocks are the matrices J} (Ai). 
We have Jj(Ai) = Ain; + Ni where N; is such that Nv = 0n;xn,;- It can be assumed that 
[Ail > là2| >... > |Am|. It follows that 


m m 


eJ QDI = 7 Toe DN In, + NDI 


t 
(ards) 


m ni-l 
= |à tie > ( : ) ` 


i=l k=0 


1 t 
zrs 


m 


I+ los 


ll 
= 


|| NI 


7 Vaal 
> |All 


as t — oo, and the proof easily follows. 


We use the multiplicative norm || A|| = > |aj;|. Thus log || Az, || < log ||A|| + log ||z;|l, there- 
fore log? || Az;|| < log || A|| + log* |z;|, which admits a finite expectation by assumption. It 
follows that y exists. We have 


t t 
log (JArAr1-- Arll) = (veras Is) = log ||A"|| + Ý- log |zi| 


i=1 i=l 


and thus 


’ 1 re es 
y = lim as. (zoe ses >be) 


i=1 
Using (2.21) and the ergodic theorem, we obtain 
y = log p(A) + E log |z|. 


Consequently, y < 0 if and only if p(A) < exp (—E log |z;|). 
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2.5 For the Euclidean norm, multiplicativity follows from the Cauchy—Schwarz inequality. Since 
N(A) = sup,zo || Axl|/Ilx|], we have 


|ABx|| || Bx] _ JABx|| || Bx 


N(AB) = 


< N(A)N(B). 


sup s 
x#0,Bx#0 |Bxl] Ixl] ~ x#0,8x#0 Bxll x0 (Il 


To show that the norm N; is not multiplicative, consider the matrix A whose elements are 
all equal to 1: we then have Nı (A) = 1 but N; (A?)>1. 


2.6 We have 

Bi + ain? Bo e= Bp a2 Qq 

1 0O .«-. 0 0 0 

0 1 0 0 0 

0 1 0 0 0 0 

* 

A= ne, 0 0 0 

0 0 1 0 0 

0 0 0 1 0 

0 0 © Q 1 0 


and bř = (w,0,..., 0)’. 


2.7 We have e, = Jotaje? , + ae? 5m, therefore, under the condition a; + a2 < 1, the 


moment of order 2 is given by 
w 
E= — 2 _ 
1— Oy = Qj 


(see Theorem 2.5 and Remark 2.6(1)). The strictly stationary solution satisfies 


4 2 2-42 
Ee; = 4E (w + æE; + aE) 


= MAA + (a; + a3) Ee + 2@(a, + a) Ee? + aa Eee? ;} 


in RU {+00}. Moreover, 


2,2 2 2,2 2 4 2,2 
Eere; = Elo + giei +a26_,)e_) = 0Ee; +a, Ee; + an Ee; 261 
which gives 
2.2 2 4 
(1 —a2)Eee_, = wEe +a\Ee,. 


Using this relation in the previous expression for E er , we obtain 


Eet E =a E faid +a) +020 — a») 


2w 
j {a + a2(1 — wo) i 


_ 2 
= mfo z (1 — a2). — ay — a2 
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Figure C.2 Region of existence of the fourth-order moment for an ARCH(2) model (when 
u4 = 3). 


2.8 


If E er < oo, then the term in brackets on the left-hand side of the equality must be strictly 
positive, which gives the condition for the existence of the fourth-order moment. Note that 
the condition is not symmetric in œı and a. In Figure C.2, the points (œ1,œ2) under the 
curve correspond to ARCH(2) models with a fourth-order moment. For these models, 


u4a? (1 + a + aa — a5) 
| 2 2 Gc 
(1 — a — œ) [1 — a — u4 {a7 (1 + a2) + a5 (1 — œ2)}] 


We have seen that (e?) admits the ARMA(1, 1) representation 
E — (a + Ble) = o + v — Bvr, 


where v, = e? -E (lew u < t) is a (weak) white noise. The autocorrelation function of a 
thus satisfies 


path) = (a+ B)pa(h—1), Yh>1. (C.1) 


Using the MA(oo) representation 


CO 
(62) i 
Sga A + a 
we obtain 
2 2 Ş 2(i—1) 2 a? 
2(0) = E 1 = =E 1 + —————— 
a a a ere v( +a) 
and 


9° 2 
val) = Ev; (: +07 (a +f) (@+ ar») =Evy (o + “| ; 


i=l 
It follows that the lag 1 autocorrelation is 


a (1— 8? — ap) 


pal = ar op 
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The other autocorrelations are obtained from (C.1) and p,2(1). To determine the autocovari- 
ances, all that remains is to compute 


Ev? = E(é? — 07)? = E (n? — 1} Eos = 2E0f, 
which is given by 


Eo; = E(w + ae? + Bo)” 


=% + 3a°Eo; H p’ Eo; + 2w(a 4 B)Eo? + 2apEo: 


wo + 2o(a + B) er B a (1 +a + B) 
1—302— B?—2oB  (1—a—f)(1 — 302 — B? — 248) 


2.9 The vectorial representation z, = b, + A;z,_, is 


e \_ ( on an Bap \ ( e 
(or aC) +O Ce) 


We have 


3a 3B 
EA, Qb, = Eb; ® A; = i ; , EF d 
a É 


2 2 
bO = o go % $ Æ P 


eS = = Ww 


The eigenvalues of A® are 0,0,0 and 3a? + 2æß + 8°, thus I4 — A® is invertible (0 is 
an eigenvalue of I, — A® if and only if 1 is an eigenvalue of A®), and the system (2.63) 
admits a unique solution. We have 


3 
1 
bO + (EA, @ b, + Eb, @ Ay) 2 = ort tet | 1 
= 7 ~ 1-—a- 8 1 
1 
The solution to (2.63) is 
3 
20) = wo (1 +a +$) 1 
ma 20 = 30? = 208 = 87) | 1 
1 


As first component of this vector we recognize Ee} , and the other three components are 
equal to E of . Equation (2.64) yields 


: 1 a 0 B 0 
wo 1 0 a 0 £ 

Ez, 8 Zn = I=- | 1 tja 0 BO Ez, @Z pgp 
1 0 a 0 Æ 
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which gives y,2(-), but with tedious computations, compared to the direct method utilized in 
Exercise 2.8. 


2.10 1. Subtracting the (q + 1)th line of (AJ,4, — A) from the first, then expanding the determi- 
nant along the first row, and using (2.32), we obtain 


A-—B, -b +: =f 
-1 À si 0 
det(AIp+q — A) = 2% 0 a s 0 
0 =f. à 
—1 À 0 
g <i 0 
TODA] ta SLAP 
0 =i. % 
=j —a2 nie Gy 


Pp 
=at |1- Bat 


j=l 


-—a, —-a —Ay 
-1 À 0 
+ S Dt (yt ar 0 —1 0 
0 —1 À 


P 
=at |1- pa 


j=! 


q 
+ Pta ( — (&ı + JIA — Laa) , 


i=2 


and the result follows. 


2. When 4 aj + Ei Êj = 1 the previous determinant is equal to zero at A = 1. Thus 
p(A) > 1. Now, let à be a complex number of modulus strictly greater than 1. Using the 
inequality |a — b| > |a| — |b|, we then obtain 


[det(A — Alp+g)|>1— $ œ; + Bi) =0. 
i=l 
It follows that p(A) < 1 and thus p(A) = 1. 


2.11 For all € > 0, noting that the function f(t) = P (t~'|X1| > €) is decreasing, we have 


CO co CO 
SOP (Xl >e)= Yo P(X >e) al P (t'|Xi] >€) dt 
n=1 0 


n=l 


CO 
1 P (e7'|X | >t) dt = €7'E|X\| < 00. 


© 


2.12 


2.13 
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The convergence follows from the Borel—Cantelli lemma. 
Now, let (X,,) be an iid sequence of random variables with density f(x) = I°? Ix>1. 
For all K > 0, we have 
CO CO 1 
Ye >K)= +00. 


n=1 n=1 


nK 


The events {n—! Xn > K} being independent, we can use the counterpart of the Borel—Cantelli 
lemma: the event {n~!X,, > K for an infinite number of n} has probability 1. Thus, with 
probability 1, the sequence (n~!X,,) does not tend to 0. 


First note that the last r — 1 lines of B,A are the first r — 1 lines of A, for any matrix A 
of appropriate size. The same property holds true when B, is replaced by E(B,). It follows 
that the last r — 1 lines of E(B;A) are the last r — 1 lines of E(B,)E(A). Moreover, it can 
be shown, by induction on f, that the ith line @;,-; of B, ... Bı is a measurable function of 
the 7;_;, for j > i. The first line of B,+1B;... Bı is thus of the form aj(y;)€14-1 + +--+ 
ar (m—r)er1—r- Since 


E{ay(nr)€1,1-1 pert ar (Ntr )lrt—r} = Eai(n)E£ 4-1 +- + Ea, (mr) Er t-r 


the first line of EB,,, B; ... Bı is thus the product of the first line of EB,.; and of EB, ... By. 
The conclusion follows. 


1. For any fixed t, the sequence (2) converges a.s. (to z,) as K — oo. Thus 
= /K 


(K) =l) 
lz; Z || > 0 a.s. 
and the first convergence follows. Now note that we have 
K K-1 K K-1)y)8 
Big? a2"! Pe E le ell) 
< Elly WF + Elz’ < o. 


The first inequality uses (a + b) < a° + b° for a,b > 0 and s € (0, 1]. The second 


inequality is a consequence of E e” < œ. The second convergence then follows from 


the dominated convergence theorem. 


2. We have a — co = A;Ar—1... Ar-x+10,_ x. The convergence follows from the pre- 
vious question, and from the strict stationarity, for any fixed integer K, of the sequence 
(K) _ (K-1l) 
Zt = Zt ;FE Zh. 


3. We have 


S 


IXa¥ = 4 YO Xna Yy = ney Yy 
ij 


for any i’ = 1,..., £, j/ =1,...,m. In view of the independence between X,, and Y, 
it follows that E|X, vy Yp? = E|Xn ij E|Yj\" > 0 as. as n > oo. Since E|Y¥;|* is a 
strictly positive number, we obtain E|X,,;’;/|’ > 0 a.s., for all i’, j’. Using (a +b) < 
a’ + b’ once again, it follows that 


Ss 


E||Xall’ =E} > |Xnajlp <>) ElXnajl’ > 0. 
i,j ij 
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4. Note that the previous question does not allow us to affirm that the convergence to 0 of 
E(\|AyAg—1 ... Aibo||°) entails that of E(\|A,Axg—1... A1||°), because by has zero compo- 
nents. For k large enough, however, we have 


E(\|AcAg-1 «-- Ar Boll”) = EM AgAg-1--- An +1 Yl") 


where Y = Ay... Abọ is independent of Ay Ax—1...Ay41. The general term a; j of 
Ay... A, is the (i, j)th term of the matrix AY multiplied by a product of n? vari- 
ables. The assumption A >0 entails aj. j >0 as. for all i and j. It follows that the 
ith component of Y satisfies Y; >0 a.s. for all i. Thus EY’ >0. Now the previous 
question allows to affirm that E(||A,A,—1... An+1||5) —> 0 and, by strict stationarity, 
that E(\|A,_y Ag_n—1--- A1 l|) —> O as k > œ. It follows that there exists kg such that 
E(||Aty An- +» Ail) <1. 


5. If a or ĝi is strictly positive, the elements of the first two lines of the vector A?b are also 
strictly positive, together with those of the (q + 1)th and (q + 2)th lines. By induction, it 
can be shown that A™*‘?-)b, > 0 under this assumption. 


6. The condition Anbo >Q can be satisfied when a; = fı = 0. It suffices to consider an 
ARCH(3) process with a = 0, a2 > 0, a3 > 0, and to check that A4bọ > 0. 


2.14 In the case p = 1, the condition on the roots of 1 — ız implies |6| < 1. The positivity 
conditions on the ¢; yield 


go = w/(1 — B) > 0, 
gi =a, = 0, 
$2 = Bia, + a = 0, 
g—1 = Bio + BA a2 +++ + Biagi + og > 0, 


be = BE t bg-1 20, k>Q. 


The last inequalities imply 6; > 0. Finally, the positivity constraints are 


k+1 
w>0, 0<ß <1, Xapi, k=0,...,q—1. 
i=l 
If q = 2, these constraints reduce to 
o>0, 0< <l, a) > 0, biai +% = 0. 


Thus, we can have a2 < 0. 


2.15 Using the ARCH(q) representation of the process (e; together with Proposition 2.2, we 
obtain 


peli) = ape — 1) +--+ + &i-ipell) + ai + 41920) + +++ +g p.2(g — 1) = a. 
2.16 Since p.2(h) = a1 p.2(h — 1) + ape (h — 2), h>0, we have p2(h) = àr? + ur} where 
A, u are constants and r1, r2 satisfy rı + r2 = a1, rir2 = —&2. It can be assumed that r2 < 0 


and rı > 0, for instance. A simple computation shows that, for all h > 0, 


pe(2h+1)<pa(2h) — uin <A r). 


2.17 


2.18 
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If the last equality is true, it remains true when A is replaced by h + 1 because i < re Since 
P21) < pe2 (0), it follows that p,2(2h + 1) < p,2(2h) for all h > 0. Moreover, 


pe(2h)>paQh-1) => ufr- 1r >A r)”. 


Since rê < r?, if pe (2) < pa (1) then we have, for all h > 1, pæ (2h) < pa (2h — 1). We 
have thus shown that the sequence p,2(h) is decreasing when p,2(2) < p,2(1). If p.2(2) > 
Pe2 (1) > 0, it can be seen that for h large enough, say h > ho, we have p,2(2h) < p.2(2h — 1), 


again because of r5 < re. Thus, the sequence foa COIS is decreasing. 


Since X, + Y, — —oo in probability, for all K we have 
P (Xn + Yn < K) 
< P(X, < K/2 or Y, < K/2) 


=P (Y, < K/2)+P(X, < K/2){1—P(% < K/2)} > 1. 


Since X, Æ —oo in probability, there exist Kg € R and ng € N such that P (X, < Ko/2) < 
ç < 1 for all n > no. Consequently, 


P (Yn < K/2) + s {1 — P (Yn < K/2)} = 1 + (s — 1)P (Y, > K/2)> 1 


as n — oo, for all K < Ko, which entails the result. 


We have 
[a(m—1) oe a(m—n)o(M—n—1)}'/" 


1 n 
= exp $ > log{a(m-i)} + or >e” as. 


i=l 


as n — oo. If y < 0, the Cauchy rule entails that 


hy = oq) + $ a(n-1) «amo (M11) 


i=] 


converges almost surely, and the process (e+), defined by €, = //h;7;, is a strictly stationary 
solution of (2.7). As in the proof of Theorem 2.1, it can be shown that this solution is unique, 
nonanticipative and ergodic. The converse is proved by contradiction, assuming that there 
exists a strictly stationary solution (€;, oÊ). For all n > 0, we have 


oo > o(n-1) + J a(n-1) -a (-)o (M-i-1). 


i=1 


It follows that a(n_1)...a(7-n)@(N_n_1) converges to zero, a.s., asn — œO, or, equivalently, 
that 


n 


Y loga(n;) +log@(n-n-1) > —œ as. as n —> ow. (C.2) 


i=l 


We first assume that EF log{a(7:)}>0O. Then the strong law of large numbers entails 
ey loga(n;) — +00 a.s. For (C.2) to hold true, it is then necessary that log œ (n-n-1) > 
—oo a.s., which is precluded since (7) is iid and œ(ņọ)>0 a.s. Assume now that 
E log{a(n;)} = 0. By the Chung—Fuchs theorem, we have lim sup }`;_; log a(n) = +00 with 
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probability 1 and, using Exercise 2.17, the convergence (C.2) entails log w(7_~n~1) > —0o 
in probability, which, as in the previous case, entails a contradiction. 


2.19 Letting a(z) =A + (1 — å)z?, we have 


oF = a(m—1)07_, = a(m-1) ++ a(m) {Aap + (1 — A)ogng}. 


Regardless of the value of o > 0, fixed or even random, we have almost surely 


1 1 1 
7 logo = 7log {ào +(1- Aoono} + T 2 logat 


—> Eloga(n,) < log Ea(n,) = 0 


using the law of large numbers and Jensen’s inequality. It follows that o? — 0 almost surely 
as t > 0d. 


2.20 1. Since the ġ; are positive and A; = 1, we have ¢; < 1, which shows the first inequality. 


The second inequality follows by convexity of x œ> x log x for x > 0. 


. Since A; = | and A, < œF, the function f is well defined for q € [p, 1]. We have 


CO 
fq) =log ) > 67 + log Elnol*4. 


i=1 


The function q +> log E|no|*2 is convex on [p, 1] if, for all A € [0, 1] and all g,q* € 
[p. 1], 
log E |o]? t?0 1" < A log Enol? + (1 — A) log Elro”, 


which is equivalent to showing that 

E[X*¥'™] < [EXP [EY], 
with X = |no|27, Y = Ino]? . This inequality holds true by Hölder’s inequality. The same 
argument is used to show the convexity of q b log ey VH . It follows that f is convex, 


as a sum of convex functions. We have f (1) = 0 and f(p) < 0, thus the left derivative 
of f at 1 is negative, which gives the result. 


. Conversely, we assume that there exists p* € (0, 1] such that Ap» < oo and that (2.52) is 


satisfied. The convexity of f on [p*, 1] and (2.52) imply that f (q) < 0 for q sufficiently 
close to 1. Thus (2.41) is satisfied. By convexity of f and since f(1) =0, we have 
f(4) < 0 for all g € [p, 1[. It follows that, by Theorem 2.6, Ee;|? < oo for all q € [0, 2[. 


2.21 Since E(e? |e, u<th= o?, we have a = 0 and b = 1. Using (2.60), we can easily see that 


Var(o7) a? 1 


E T E = Ao 
Var(€;) 1 — 2a, Bi — By Ky 


’ 


since the condition for the existence of Ee} is 1 — 2a; fp, — pi > OF Ky Note that when the 
GARCH effect is weak (that is, a; is small), the part of the variance that is explained by 
this regression is small, which is not surprising. In all cases, the ratio of the variances is 
bounded by 1/x,, which is largely less than 1 for most distributions (1/3 for the Gaussian 
distribution). Thus, it is not surprising to observe disappointing R? values when estimating 
such a regression on real series. 
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Chapter 3 


3.1 


3.2 


3.3 


Given any initial measure, the sequence (X;);en clearly constitutes a Markov chain on 
(R, BCR)), with transition probabilities defined by P(x, B) = P(X; € B | Xo = x) = P; (B — 
Ox). 

(a) Since P, admits a positive density on R, the probability measure P(x, .) is, for all 
x € E, absolutely continuous with respect to à and its density is positive on R. Thus any 
measure y which is absolutely continuous with respect to à is a measure of irreducibility: 
Vx € E, 

~(B)>0>A(B)>0=> P(x, B)>0. 


Moreover 4 is a maximal measure of irreducibility. 

(b) Assume, for example, that £, is uniformly distributed on [—1, 1]. If 0 > 1 and Xo = 
xo > 1/(@ — 1), we have xp < X; < X2 <..., regardless of the ¢,. Thus there exists no 
irreducibility measure: such a measure should satisfy g(] — oo, x]) = 0, for all x € R, which 
would imply g = 0. 


If (Xn) is strictly stationary, X; and Xo have the same distribution, jz, satisfying 
VB €B, w(B)=P(Xo € B) = P(X, € B) = / P(x, B)du(x). 


Thus jz is an invariant probability measure. 
Conversely, suppose that u is invariant. Using the Chapman—Kolmogorov relation, by 
which Yt e N, Vs, O<s<t, VWxreE, VBE€E, 


P'(x, B)= f P*(x, dy) P'S (y, B), 
yeE 


we obtain 


a(B) = PIX: € BJ = f [f PO, dvuld»)| P(x, B) 


= | way f Poanre. B) = | wayP ty. 8) = PLX) € B]. 


Thus, by induction, for all t, P[X; € B] = u (B) (YB € B). Using the Markov property, 
this is equivalent to the strict stationarity of the chain: the distribution of the process 
(Xt, Xi+1, ---, Xt+k) is independent of t, for any integer k. 


We have 


z(B) = im fuan, B) 


— lim uid» f Pæ, dy) PO, B) 


t>+00 


[Po B) lim | Pa ayaw 
y =F 4-00 x 


= f P(y, B)x(dy). 


Thus z is invariant. The third equality is an immediate consequence of the Fubini and 
Lebesgue theorems. 


382 GARCH MODELS 


3.4 Assume, for instance, 0 >0. Let C=[-c,c], c>0, and let 5 =inf{f(x); x € 
[-d+0)c, (1 +6@)c]}. We have, for all A C C and all x € C, 


P(x, A) = i f(y — Ox)dA(y) > SAÇA). 
Now let B € E. Then for all x € C, 
P(x, B) = Í P(x, dy) P(y, B) 
> ih P(x, dy)P(y, B) 
> 8 [aro B) := v(B). 


The measure v is nontrivial since v(E’}) = 6A(C) = 26c > 0. 


3.5 It is clear that (X,) constitutes a Feller chain on R. The A-irreducibility follows from the 
assumption that the noise has a density which is everywhere positive, as in Exercise 3.1. In 
order to apply Theorem 3.1, a natural choice of the test function is V(x) = 1 + |x|. We have 


E(V(X; | X:-1 = x)) < 1+ E(|0 + be:|)|x| + E(ler|) 
:= 14+ K,|x|+ Ko = Ki g(x) + K2 +1- Kı. 


Thus if Kı < 1, we have, for Kı < K < 1 and for g(x) >(K2+1— K,)/(K — Kı), 
E(V(X; | X1 = x)) < Kg(x). 


If we put A = {x; g(x) = 14+ |x| < (K2 + 1 — K,)/(K — Kı)}, the set A is compact and the 
conditions of Theorem 3.1 are satisfied, with 1 — ô = K. 


3.6 By summing the first n inequalities of (3.11) we obtain 


n=l n—l n—-l 
P+! V(x) < (1 —8) XOP Vo) +b > P' (xo, A). 
t=0 t=0 t=0 


It follows that 


n—-1 n—-1 


b» P' (xo, A) > ô XO PVC) + P” V (xo) — (1 — 8) V (xo) 


t=0 t=1 


>(n—15+1—-(1—8)M, 


because V > 1. Thus, there exists x > 0 such that 


1 n—1 i= {=M 1 
On(x0, D A) = — 8 + —— - ok 


t=1 


nb n 


Note that the positivity of 6 is crucial for the conclusion. 
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3.7 We have, for any positive continuous function f with compact support, 


Í ford») = lim / FV) On (x0, dy) 
E > JE 
= lim | / PF(y) Om (X0. dy) 
k>ow E 
1 
+- / fo) (Pao, dy) = P", dy) } 
NkJE 
= lim f Pid 
k->0o E 
> f P f(y) (dy). 
E 


The inequality is justified by (i) and the fact that Pf is a continuous positive function. It 
follows that for f = Ic, where C is a compact set, we obtain 


m(C) >f P(y, C)m(dy) 
E 


which shows that, 


VB €E, m(B)> f P(y, B)x(dy) 
E 


(that is, x is subinvariant) using (ii). If there existed B such that the previous inequality were 
strict, we should have 


u(E) = 2(B)+7(B‘) 


= / P(y, B)x(dy) +f P(y, Bn (dy) = f P (y, E)n(dy) = m (E), 
E E E 
and since m (E) < co we arrive at a contradiction. Thus 
YBEE, xn(B)= f P(y, B)n (dy), 
E 


which signifies that x is invariant. 
3.8 See Francq and Zakoïan (2006a). 


3.9 If sup, nu, were infinite then, for any K >0, there would exist a subscript no such that 
Noun, > K. Then, using the decrease in the sequence, one would have Dri Uk = Ngon > K. 
Since this should be true for all K > 0, the sequence would not converge. This applies directly 
to the proof of Corollary A.3 with un = {a x(n)}"/C*™), which is indeed a decreasing sequence 
in view of point (v) on page 348. 

3.10 We have 
k-1 


d4 = 2 |Cov (X; X14, X1-eX14Kn-0)| < d7 + dg, 
e=1 
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where 


k-1 


dı = `> [Cov (Xe, X:Xi4nXi+k-0)1, 
(=1 


k-1 
dg = 5 |EX:Xi+nEXt—eXt+k-el - 
(=1 


Inequality (A.8) shows that d7 is bounded by 
CO 
K X axe. 
(=0 


By an argument used to deal with de, we obtain 


dg < K sup(k — Dax (k) 0H < 00, 
k>1 


and the conclusion follows. 


3.11 The chain satisfies the Feller condition (1) because 


9 
x k\ 1 
BleX) |X-i=ab=Doe(S45) 5 
SNT 10) 9 


is continuous at x when g is continuous. 
To show that the irreducibility condition (ii) is not satisfied, consider the set of numbers 
in [0, 1] such that the sequence of decimals is periodic after a certain lag: 


= {x = 0, uju2... such that dn > 1, (ur)r>n periodic} . 


For all h > 0, X, € B if and only if X;,, € B. We thus have, 


Vt, P'(X,€B|X9=x)=0 forx¢B 


and, 


Vt, P'(X;€B|Xo=x)=0 forx eB. 


This shows that there is no nontrivial irreducibility measure. 
The drift condition (iii) is satisfied with, for instance, a measure @ such that 
¢ ([—1, 1]) > 0, the energy V(x) = 1 + |x| and the compact set A = [—1, 1]. Indeed, 


X Ur 
E(X|+1|X1=ysk(lo+5 F 


js e saai 
10 10 5 i 


10 ~ 


provided 


9 1 
(2a es 
~ T+ |x| 
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Chapter 4 


4.1 


4.2 


2 


2 |, where œ = wn? and a; = an?. It follows that 


We have a = Wt + aye 


2 2 
Ent = mt (w FAOmt-1 FF AAmt - - - Cmt—m+2@mt—m+1 
2 
+ AAmt-1--- Omt—m+1€m(t—1)) 


(age Te ~ 2 
= Nmt (© ot 1 En(t—1))- 


Since (€;) is supposed to be a nonanticipative solution of the model, €mg—1) is independent 


Of Nmr, -- -Nmt-m+1. It follows that, using the independence of the process (n+), 
E(€mt|€m(t—1) Em(t—2)> ++ a) = 0 
Var (€mt|€m(—1)> Em(t—2), ++ ) = ot+a+-:--4 a”) } ader gaily: 
Thus, the process (€,,;) is a semi-strong ARCH(1). 
Let 
~ Emt 
™ = Woe eB 
{Var(€mtl€mc@—1)> Em(t—2)1 ++ -)} E 
m ~ 2 1/2 
=n O; + En (t—1) 
= Nme | ——_i—@-@--—---mN——_— 
m a(l baoisg aly ae 
2 1/2 
o + OF Emna- 
= Nmt 1 + a ere eee ae SS i ee ’ 
ol Fats Fa" ame 
where wf = @ —@(lta+---4 a”—!) and ay = &, — a”. As in the case m = 2, we show 


that (€»;) is not a strong ARCH(1) process by showing that (ñ+) is not iid. We assume that 
(ñ+) is iid and give a proof by contradiction. Note that 


2 ~2 2 2 2 =2 2 —1 
Em(t—1) {a GR Nne) CHUPA — Mint ©r (Hj; Nin OC SPO eae a™ ). 


Since the random variable e? (1) is nondegenerate (assuming that n> a- is nondegenerate) 


and is independent of (ñ+, Nmt, @7, a7) (the vector (w7, a;*) being a measurable function of 
Mmt, Nmt—-1s -+> Nmt-m+1}), we have (see Exercise 11.3) 


mggx2 2 PR 
a (ij; ~ Nt) = & Nmt = 0 


and 


nzo — (fr — nol a a”) =0 


with probability 1. When «œ # 0, elementary calculations show that we then have 


m—1,2 


2 2: 2 a 2 
Te Ot F HA Nmt—1 `t Nmt-m41 = Nmt- `t Nmt—m+1: 


2 


In view of Exercise 11.3 this is impossible, because the law of ngm 4 


, is nondegenerate. 


By Theorem 4.2, we know that (€,,;) admits a weak GARCH(1, 1) representation. We now 
give a direct proof and compute the coefficients aim), bon) and Con) of this representation. 
Let 


2 
ag_, =C + vi — bivi — bovj,-2 
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be the GARCH(1, 2) representation of (€+). We have 


Sage = (1 +aL +- +a”! L™ Nt — aL)e? 


=c(l+---+a™ (1 4--- +a" L” Yd — bL — baL’, 


=c¢c(1+--- a”) Ur: 


The variable v; being function of (1;, ¥;-1, ..., Ve—-m—1), Where (v;) is a noise, the process Vmr 
is an MA(1) process of the form Umt = Von), — Don) Von),t—1- We obtain bom) as the solution, 
of modulus less than 1, of the equation 


bon) = —Cov(v; ’ Ur—m) 


1402, Var(vr) 


7 a" (bya +t by + abo(a — bi))A a’) 
~ 1 —a2)(1 + baD) + (a — b) — a2-D) + Sy,’ 


where ôm = 2a(a — by) — 2a?" (1 + bi) + ba(1 + a2"). We retrieve the result 
obtained for the GARCH(1, 1) process when b: = 0. 


Using the independence between W, and €,_, for h > 0, and then the independence between 
the process (W;) and (e;), we have 


Cov(e?, €?) = Cov(e?, e?) + Cov(W7, e2) + 2Cov(e; W;, €7_;) 


= Cov(e?, eo) + Cov(e?, W2 ,) + 2Cov(e?, e€t—n Wi—-n) 
= Cov(e?, e7 ,). 


Except for the variance, (€,) and (e;) thus have the same autocovariance structure. Since (e;) 
is a strong GARCH(p, q), the autocovariance structure of its square is determined by 


max{p.q} 
Cov(e?, e?_,) = >D (ai + bi)Cov(e?, EET h> p. 


i=1 


The same relation holds true for (e?) whenever h > max{p, q} (since the last term in the 
sum is Cov(e?, ei pinaxt ps gy which cannot be replaced by Cov(e?, <3 +max{ ), unless 


Pq} 
h >max{p,q}). The ARMA representation of (e?) follows. 


It suffices to verify that 
E(v)=0 and E(u;y,-~) =0, Wk >max{p, q}. 


The first equality follows from the fact that (€,) and (u+) are noises. For k > max{p, q}, we 
have 


q 
E (vrv) = 2c X aj E (€r—i V-k) + X aiajE(€ri€r jv) 
i=l iAj 
P 
+E (uvik) — È bi E (uivi) 
i=l 
The first two sums are null because (€+) is a martingale difference. The last two sums are 


null because 7, is uncorrelated with any variable belonging to the o-field generated by the 
past of ez. 
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4.5 1. The relation follows from the independence between Z, and the v,_;, j 4 0. 
2. Expanding the equation for o?, we have 


h-1 


oF = Alv) + >) BU). BOY 14 A(Y-1) + BQY=1) «- BO nO 4- 
i=2 


It follows that 
Cov(B(v;)o7_,, A(vi-n)) = E{B(v1)}Cov(B(u-1) ... Burn) 07_,, A(Ur—n)) 
= [E{B(v,)}]"Cov(o7_,, Av) 
= [E{B(v,)}]"Cov(A (vi-n) + Brn)? n1 Arn) 
= [E{B(v,)}]"Var(A(v;)) + Cov(A(v,), B(v;)) E (0P)]. 
Similarly, 
Cov(B(u;)o7_,, Bi-n)o7_4_1) 
= [E{B(v:)}]" Cov(o2 p, Br—n)or_p_1) 
= [E{B (v) H" Cov(A (vrn) + BQ =n) O7-h 1+ BOn) n1) 
= [E{B(v;)}]"[Var(B(u;))(Eo;) + E(B) Var(o;) 
+ Cov(A (v+), B(v:)) EF]. 
The conclusion follows. 
3. Since the constraint d? + b? < 1 entails the second-order stationarity of (67), we have 


[a(l — d) + bef 


a © 2) _ 
E(o;) = —— and Var(a;") Ged -@ By 


l-d 
It follows that 
, [a(l — d) + bc}? 
ye (h) = Coyle” €? ,) = d" (—d2—a@2—By’ Vh>0. 
4. Relation (4.11) follows from the fact that y.2(h) = dy.2(h — 1) for h> 1. We obtain £ 
from the lag 1 autocorrelation of the process (€?), solving the quadratic equation 
(d+ p)l+ep) _ efa(1 — e) + bc}? 
1+2ep +8? — E(ZA (a(l — e) + bc} + c? (1 — e? — b?)) ` 


4.6 The chain being iid, the rows of its transition probability matrix are equal. It follows that the 
matrix is of rank 1. The unique nonzero eigenvalue is thus the trace, which is equal to 1. In 
this case, all the àg in (4.9) are equal to zero, and we have 


max{ p,q} p+K-1 
(:- a atin!) d=os I+ om BiL' | u. 
i=l 


i=l 
Note that the AR part is the same as when w is constant. 
2 i 
4.7 1. Ee? = 0T li) 


2. The matrix P of the transition probabilities admits the eigenvalues 1 and à. Note that 
—]1 < à < 1 by the irreducibility and aperiodicity assumptions. Diagonalizing P, it is 
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easy to see that the elements of P* are of the form p“(i, j) = a,(i, j) tani, j)a*, 
for k > 0. Taking the limit as k tends to infinity, we obtain a(i, j) = (j), and using 
the value k=0, we obtain a(i, j) + a2(i, j) = I=j;. It follows that for j= 1,2 
and i Æ j, 


PPA D =r), pG, j=) +A- ar), 


and (4.15) follows. 


. Using the independence between the process (n;) and (A+), for k > 0 we have 


Cov(e?, €p) = Cov {a?(A,), o7(Ar_x)} 
= E {o?(A,)o*(A;_«)} — {E0 (Ap)? 


2 2 2 
= D PU, Drioiwj — {y=} 


ij=1 


2 
= X p80, D- rU Doo. (C.3) 


ij=1 


Using (4.15), we then have 


2 
Cove, 674) = 4 YOA -aG -Y rOx(Noie; 


j=l ižj 
= Ko) — oP r (122), k>0. (C.4) 

. Similar calculations show that 
Var(€?) = {w1 — oF n(n (2) + {wpm (1) + o3 (2)}Var(n?). (C.5) 


. In view of (C.4), we have Cov(€?, e?) = ACov(€?, €? ,) for k > 1. By (C.5), this relation 


is generally not true for k = 1. It follows that e satisfies an ARMA (1, 1) model of 
autoregressive coefficient À. 


. In this case 4 = 0, thus Ej (up to its mean) is a white noise. 


. We obtain 


e? — 0.6e2; = 1+ u; —0.427Bu,-1, 


where (u;) is a noise. 


We verify that (e+) is a noise, using the independence of the sequence (7;) and the 


existence of u4 = En}. For k > 1, we have Bay i eki = l and Enni ana = 


l4. It follows that 


Cov(e?, e?) =0, k>l, 


Cov(e?, N = u4- l. 
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Thus (e?) admits an MA(1) representation of the form e? = l] + u; — Qus—1, where (ur) 
is a noise and 0 is a parameter which depends on p4. It follows that (€;,) is a weak 
GARCH(0, 1) process. 


2. The process (X;) := (e? — 1) is a weak MA-GARCH if (u?) has an ARMA representation. 
We have a 
u? = ou = x? + 2X, Soe Xi = Ups 


i=1 


This equation determines the AR part of the ARMA model. Note that the existence 
of E(n®) implies that of E (u$), and thus that of E(v7). 
In order to show that (v;) is an MA process, compute its autocovariance. We have 


CO 
Cov(v;, vik) = Cov(X?, X? y+ 2X, X O'X) 


i=1 
oo 


+ Cov(20X; X11, X? p + 2X1-% > OX) 


i=l 


CO co 
+ Cov(2X, > 6! Xi, X? p + 2X14 9 O'X). 
i=2 i=l 


Since X; is function of (v,, v;—1), the first covariance on the right-hand side is equal to 0 
for all k > 1. For the same reason, X,X,—1 is function of (v;, v;—1, ¥;—~2), thus the second 
term is null for all k >2. Finally, the last covariance is null using E(X,) = 0 and the 
independence between X; and X;— (for all k > 2). We have thus shown that (e? — l) is 
a weak ARMA (0, 1)-GARCH(1, 2) process. 


4.9 Assume that €;; and éx are two independent weak GARCH processes. Without loss of 
generality, it can be assumed that these two GARCH models have the same order (r, p) 
(adding null coefficients if necessary). The existence of an ARMA representation for (e?) 
entails an autocovariance function of the form 


£ 
= A h 
SER, Whe p, 


where the ġ; (1 <i < £ < r) are the distinct complex roots of the AR polynomial and the 
P; are polynomials, the degree of P; being equal to the order of multiplicity r; of the root 


Qi. Similarly, we have 
m 


yah) = 2 Qiy, Wh > p, 


with analogous notation. Using (4.13) with €; = €1r + €2;, we obtain 


m 


TA 
yah) =} Po +) Qiy, Vh> p. 


i=1 i=1 


It follows that E is an ARMA process. The roots of the AR polynomial are the ¢; and y;. 
Thus, if these roots are distinct, the AR polynomial of the aggregated process is obtained by 
multiplying the AR polynomials of the processes Ei 
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Chapter 5 


5.1 


5.2 


5.3 


Let (F,) be an increasing sequence of o-fields such that €; € F, and E(é;|F;-,) = 0. For 
h>0O, we have €€;4n E€ Fr+h and 


E(€r€:¢n|Fr4n—-v) = €t E(€r4n|Fi4n—-1) = 0. 


The sequence (€;€;4, Fr+h)r is thus a stationary sequence of square integrable martingale 
increments. We thus have : 
1/2 22 
n"? (h) NO, Ee? ern), 
where y(h) = am! Xi €Et€t4h. TO conclude, ! it suffices to note that 


n 


nP) nP h =n So eren 
t=n—h+1 


in probability (and even in L7). 


This process is a stationary martingale difference, whose variance is 


y0) = Ee? = i 
Its fourth-order moment is 


Ee} = p (w* + a Ee} + 2awEe). 


Thus, 
Ret = ual + 2amEe?) m uw (l +a) l 
' 1 — pga? (1 — @)(1 — u4a?) 
Moreover, 5 
bee, = E(w +ae?_,)e? , = +aEe}. 
—a 


Using Exercise 5.1, we thus obtain 


2 
1 

n'/?9(1) £ nfo en : 

d — a) (1 — prgor*) 
We have 
nl? Py 
? 0) 
By the ergodic theorem, the denominator converges in probability (and even a.s.) to 
ye(0) = w/(1—a@) #0. In view of Exercise 5.2, the numerator converges in law to 


N { 0, ee Cramér’s theorem? then entails 
Sadia 


n'? âQ) = 


l-a)i+a 

n'/26(1) x nfo. — i 
(1 — paca") 

The asymptotic variance is equal to 1 when œ = 0 (that is, when €; is a strong white noise). 

Figure C.3 shows that the asymptotic distribution of the empirical autocorrelations of a 


GARCH can be very different from those of a strong white noise. 


‘If X, > x, x constant, and Y,, £ Y, then X,Y, 5 ee, 


7 If Y, 5 Y and T, — t in probability, ¢ constant, then T, Y, 4 Yi: 
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yo wo A oO Q 


ae 


poittiriitirirtiiriritiriiititr g 


0.1 0.2 0.3 0.4 0.5 


Figure C.3 Comparison between the asymptotic variance of ./n(1) for the ARCH(1) process 
(5.28) with (n,) Gaussian (solid line) and the asymptotic variance of ./nA(1) when €, is a strong 
white noise (dashed line). 


5.4 Using Exercise 2.8, we obtain 
Ee ery = Vath) + y?(0). 
In view of Exercises 5.1 and 5.3, 
A ss 
n!’ ph) + N{0, Ee?e?,,/v7(0)} 


for any h Æ 0. 
5.5 Let F, be the o-field generated by {n,, u < t}. If s +2>t +1, then 


Fee 1€s€s42 = E {E(€r€141€s€s42|Fs41)} = 0. 


Similarly, Fe;€;41€;€s42 = 0 when t + 1 >s +2. Whent+1=s +2, we have 


2 2 2 oD 
Fe €41€s€s42 = Ee 166;,, = EG_16(w+ ae; + por) 


EN Ee&_10;m1(@ + aon? F fo?) = 0, 
because €10; € Fy1, 107 € Fii, E(m|Fi-1) = Em = 0 and E(n}|F;-1) = En? = 0. 
Using (7.24), the result can be extended to show that Ee;€;4n€s€s44 = 0 when k #h and 
(€+) follows a GARCH(p, q), with a symmetric distribution for nz. 


5.6 Since Fee,4;=0, we have Cov {€;€/41, €s€s42} = Eee:41€s€s542 =O in view of 
Exercise 5.5. Thus 


n—1 n—2 
Cov fn"? AA), n'/7p(2)} =n! YTD) Cov {eremi €s€s+2} = 0. 


t=1 s=1 


5.7 In view of Exercise 2.8, we have 


aa Tipar 
_ o (1 +a + B) a 
Ne igs pyaar poe (+S): 
1- p- 
potty = U nay =@+Bpah—D, Yao 1. 


1— B2 —208 ” 
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with o=1, œ =0.3 and £= 0.55. Thus ye(0) = 6.667, y,2(0) = 335.043, p.2(1) = 
0.434694. Thus i 
Eeer — 0.85'~!y.2)p2(1) + ¥2O) 


vO y2(0) 


fori = 1,...,5. Finally, using Theorem 5.1, 


4.27692 0 0 0 0 
0 3.78538 0 0 0 
lim Var./nfs = 0 0 3.36758 0 0 
a 0 0 0 3.01244 0 
0 0 0 0 2.71057 
5.8 Since yx (€) = 0, for all |£| >q, we clearly have 
+q 
vi = >> oO. 
t=-q 
Since p2 (4) = a!l, ye(0) = w/(1 — æ) and 
(0) =2 ay? 
YaN? etl — 3a?) 
(see, for instance, Exercise 2.8), we have 
vf, = — Y allor @ +i {ox € +i) + ox- 
1 -— 3g? 7 


2 í i—l 2 
= za O. 
1 — 3 = 


Note that vr; > 0as i > œ. 


5.9 Conditionally on initial values, the score vector is given by 


3t, (0) 1% ð f0) P 
pict eee 2 l 
00 ale Toeg 


tan aFo(Wr) , 1 (20) ðo? 
Fe E eO + 5 (LP — 1) oer 
a Fa (W, 
+ i e«(0) ae 2 


where €,(9) = Y, — Fo(W,). We thus have 


io = 1 pa) {rae} 
n= = E i — ae E 
o aw aw’ 


2 


and, when o? does not depend on 9, 


1 | o | 3 Foy (Wr) | 1 | ced | a} 


je eA To = BOE) | oP) 
1o oe ap ap 2 g? ap aw! 
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5.10 In the notation of Section 5.4.1 and denoting by 6 = (£1, 84)’ the parameter of interest, the 
log-likelihood is equal to 


1 


202 


£,(B) = IY — X1Bi — X2poll’, 


up to a constant. The constrained estimator is 8° = Gi. 0)’, with Be = (X{ X1)! XY. The 
constrained score and the Lagrange multiplier are related by 


a Bc 0 pN 1 I rye Ce Ac 
eb =(3). A= 240, U =Y =- Xf}. 


On the other hand, the exact laws of the estimators under Ho are given by 


Vn(B — B) ~ N (0, 1™'), [ea a 


no? 
and i. 4 
air N{0, 0}, 1? = (2 = bitha) 
with 1 
hj = —yXIX). 


For the case Xi X2 = 0, we can estimate 1? by 
` ‘ 7 1 x 

22 =j RE) “2 

“= Typ > In = ng X5Xo, no’ = IIU“ I n 


The test statistic is then equal to 


aea Û Xa (XXL) XÔ 6% — 6? 
LM, N N E =n =n -R, (C6) 
UC Us on 


with 
nô™ =||0°|?, US =Y— XB, 


and where R? is the coefficient of determination (centered if X; admits a constant column) 
in the regression of U¢ on the columns of X>. For the first equality of (C.6), we use the fact 
that in a regression model of the form Y = X8 + U, with obvious notation, Pythagoras’s 
theorem yields 

Y'X(X'X) XY = ||P IP = 1Y 1? — 101°. 


In the general case, we have 


Oe Xa {XoX — XoX (Xi XI XX, XSUe oro? 
LM, = |a@_—  _ = I— >" 
Uc US an 
Since the residuals of the regression of Y on the columns of X; and X3 are also the residuals 
of the regression of U© on the columns of X 1 and X2, we obtain LM, by: 
1. computing the residuals Û® of the regression of Y on the columns of X4; 


2. regressing Û® on the columns of Xz and X, and setting LM, = nR?, where R? is the 
coefficient of determination of this second regression. 
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5.11 Since y = 0, it is clear that R2. = R*, with R2. and R? defined in (5.29) and (5.30). 
Since T is invertible, we have Col(X) = Col(X), where Col(Z) denotes the vectorial 
subspace generated by the columns of the matrix Z, and 


Py = XT(T'X’XT)"'T'X' = Py. 


If e € Col(X) then e € Col(X) and 


Py¥ =cPy¥ +dPye = cŶ + de. 
Noting that ¥ := n~! $'L; Ë, = cy + d = d, we conclude that 


Pf -5e IFIP 


IË = 5e? "Ro 


5.12 Figure C.4 and Tables C.1-C.6 lead us to select the conditionally heteroscedastic weak white 
noise model for the S&P 500 and the DAX, but one can also try several ARMA models on 
the S&P 500 (see Table C.7). From Table C.8, a GARCH(1, 1) seems plausible for the S&P 
500. For the DAX index, we can envisage the GARCH(2, 1) and GARCH(2, 2) models in 
particular (see Table C.6). 


SP Returns DAX Returns 
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n 
2 5° 
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g DS 
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oO 
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Figure C.4 Correlograms of the returns and squares of the returns of the S&P 500 index from 
March 2, 1990 to December 29, 2006 and of the DAX index from November 27, 1990 to April 4, 
2007. 
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Table C.1 Portmanteau tests on the squares of the DAX and S&P 500 returns. 
Tests for noncorrelation of the squared DAX returns 


m 1 2 3 4 5 6 


ĝem) 0.185 0.284 0.246 0.241 0.239 0.279 
55 a(n) 0.031 0.031 0.031 0.031 0.031 0.031 
ols 141.144 475.003 725.039 964.026 1201.023 1523.802 
p-value 0.000 0.000 0.000 0.000 0.000 0.000 
m 7 8 9 10 11 12 

ĝe (m) 0.230 0.270 0.186 0.214 0.213 0.215 
55 a(n) 0.031 0.031 0.031 0.031 0.031 0.031 
QLB 1742.534 2043.226 2186.525 2376.318 2564.566 2755.603 
p-value 0.000 0.000 0.000 0.000 0.000 0.000 


Tests for noncorrelation of the squared S&P 500 returns 


m 1 2 3 4 5 6 

ĝe2 (m) 0.204 0.200 0.192 0.148 0.203 0.141 
ÕS > (m) 0.030 0.030 0.030 0.030 0.030 0.030 
(Opes 177.035 346.639 503.346 595.875 771.254 855.922 
p-value 0.000 0.000 0.000 0.000 0.000 0.000 
m 7 8 9 10 11 12 

ĝe2 (m) 0.160 0.151 0.115 0.148 0.141 0.135 
So (m) 0.030 0.030 0.030 0.030 0.030 0.030 
ou 964.803 1061.963 1118.258 1211.899 1296.512 1374.324 
p-value 0.000 0.000 0.000 0.000 0.000 0.000 


Table C.2 LM tests for conditional homoscedasticity of the DAX and S&P 500 returns. 
Tests for absence of ARCH effect for the DAX 


m 1 2 3 4 5 6 7 8 9 
LM, 141.0 408.3 524.3 590.0 640.5 723.4 746.9 789.0 789.3 
p-value 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 


Tests for absence of ARCH effect for the S&P 500 


m 1 2 3 4 5 6 7 8 9 
LM, 176.9 287.7 358.0 376.7 442.0 449.8 467.8 478.6 479.6 
p-value 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
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Table C.3 Portmanteau tests on the series of DAX returns from November 27, 1990 to April 4, 


2007. 


m 
ôm) 
6p (m) 
Om 


p-value 


m 
ôm) 
Sim) 
Om 


p-value 


m 
ôm) 

ô, p(m) 
on 
p-value 
m 

A(m) 

Ô pim) 
On 


p-value 


0.016 
0.030 
1.105 
0.293 


9 
0.000 
0.030 

25.629 
0.002 


2 
—0.007 
0.050 
0.148 
0.929 


10 
—0.025 
0.046 
11.739 
0.303 


2 
—0.020 
0.030 
2.882 
0.237 


10 
0.011 
0.030 

26.109 
0.004 


Tests for GARCH noise based on Q, 


3 
—0.040 
0.047 
2.933 
0.402 


11 
0.027 
0.046 

13.095 
0.287 


4 
—0.036 
0.047 
5.140 
0.273 


12 
—0.002 
0.046 
13.103 
0.362 


5 
—0.019 
0.047 
5.777 
0.328 


13 
—0.002 
0.044 
13.110 
0.439 


Usual tests, for strong white noise 


3 
—0.045 
0.030 
11.614 
0.009 


11 
0.010 
0.030 

26.579 
0.005 


4 
0.015 
0.030 

12.611 
0.013 


12 
—0.014 
0.030 
27.397 
0.007 


5 
—0.041 
0.030 
19.858 
0.001 


13 
0.020 
0.030 

29.059 
0.006 


14 
0.024 
0.030 

31.497 
0.005 


—0.025 
0.030 
24.826 
0.001 


15 
0.037 
0.030 

37.271 
0.001 


16 
0.001 
0.030 

37.219 
0.002 


Table C.4 Portmanteau tests on the S&P 500 returns from March 2, 1990 to December 29, 


2006. 


p(m) 
Spm) 
on 
p-value 
m 

p(m) 

ô, p(m) 
oe 


p-value 


1 
—0.002 
0.045 
0.011 
0.915 


9 
0.008 
0.039 

11.392 
0.250 


Tests for GARCH noise based on Qm 


3 
—0.034 
0.044 
3.665 
0.300 


11 
—0.013 
0.041 
12.212 
0.348 


4 
0.007 
0.041 
3.773 
0.438 


12 
0.049 
0.041 

17.734 
0.124 


5 
—0.034 
0.045 
6.039 
0.302 


13 
0.034 
0.039 

20.630 
0.081 


Usual tests for strong white noise 


3 
—0.034 
0.030 
7.964 
0.047 


11 
—0.013 
0.030 
25.069 
0.009 


5 
—0.034 
0.030 
13.176 
0.022 


13 
0.034 
0.030 

40.052 
0.000 
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Table C.5  Studentized statistics for the corner method and selected ARMA orders, on the DAX 
returns. 


sD iG ds ses Ba eae oe Ome wid Hee Oe ee Dees LO. peddle t. DA E EEE TES E T 
a =0.3 =0,3 -1.7 1.5 -0.8 -1.7 =0.7 1,3 -0,1 =i. 1.2 0.1 -0.1 1.8 0.8 
2 0:3 -0,23 0.9 0.2 1,1 0.7 1.0 0:6 0.8 0.5 0.6 0,1 -0:1 0.9 

3 =1;7 0.9 -0.6 -0.6 -0.9 0.1 =0.6 =0.1 0.2 O.1 O.4 0.6 0.7 

4 =1L:5 0.2 0.6 -0.7 0.7 =0.5 0.4 0.4 0.1 =0.2 0.1 0.3 

5 -0.8 1,1 -0.9 0.7 -0.5 0.3 -0.4 -0:1 0.2 0.0 0.2 

6 1.7 0.7 0-0 -0.5 40.3 0.3 0-a 0,3 0-3 0.2 

7 -0.6 1.0 -0.6 0.3 -0.4 0.4 -0.3 0.4 0.1 

8 -1.1 0.5 0.2 0.4 0.2 0.3 -0.4 0.4 

9 -0.2 0.8 0.1 0.1 0.2 0.3 0.2 

10 150 055 -0.1 -0.1 0.0 0.2 

11 1.3 0.6 0.3 0.1 0.2 

12 Lt 0.2 =0,5 0.2 

13 -0.2 0.1 0.8 

14 =2.1 1.1 

15 0.7 


ARMA (P,Q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL 


PROBA CRIT MODELS FOUND 
0.200000 1.28 ( 0,14) (1, 1) (14, 0) 
0.100000 1.64 ( 0,14) (dy) (14, 0) 
0.050000 1.96 (0, 1) (14, 0) 

0.020000 2.33 (0, 0) 
0.010000 2.58 ( 0, 0) 
0.005000 2.81 (0, 0) 
0.002000 3.09 (0, 0) 
0.001000 3.29 (0, 0) 
0.000100 3.72 (0, 0) 
0.000010 4.26 ( 0, 0) 
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Table C.6 Studentized statistics for the corner method and selected ARMA orders, on the 
squared DAX returns. 


sme A Pa a A P ES = PA Li aioe Bis sien 4 5 6. 7 8 Distress MONE ol Mgrs ih Desa 25S 5 sale Ae peel Ota. a 
1 4.4 6.0 7.0 4.4 5.1 4.7 4.5 4.7 4.2 4.1 4.0 3.9 4.2 4.5 4.8 
2 -6.6 -8 -0.4 20-=0.9 220 =120° 228° =1.9 026. 020 0.9 -1.3 a 
3 8.4 -0.2 0.4 -0.3 0.5 0.4 -0.9 0.8 1.4 0.3 -0.5 0.5 0.3 
4 25:29. =Or2s- 10h dL, -6 -0.4 0.4 0.4 0.8 -1.0 0.8 -0.7 0.5 
5 7.1 -1.1 0.6 -0.4 -0.4 0.3 0.6 0.9 0.7 -0.7 -0.3 
6 -4.2 -0 -0.4 0.4 -0.4 0.5 -0.5 0.7 -0.7 0.6 
7 221. =1.3 =-1.0 0.3 0.7 -0.3 -0.6 0.5 0.3 
8 =3.9 2.8 =1.1 1.0 =1:0 0.7 =0.6 0.5 
9 0.4 -1.4 2.0 -1.4 0.7 -0.8 0.0 
10 skel 0.3 -0,5 1.0 O27 0.6 
11 1.8 -4 -0.1 -0.8 -0.3 
12 2155 29 S07 . 06 
13 0.3 -0.9 0.2 
14 =0'9 -6 
15 =O al 
GARCH (p,q) MODELS FOUND WITH GIVEN SIGNIFICANCE LEVEL 
PROBA CRIT MODELS FOUND 
0.200000 1.28 ( 4, 1) ( 4, 2) ( 4, 3) ( 4, 4) (de: 9) ( 0,12) 
0.100000 1.64 (3, 1) C235 2) G Sy 3) ( Le 9) (0,17) 
0.050000 1.96 ( 2, 1) (2, 2) (0, 8) 
0.020000 2.33 (a-i) (C252) (0; 289) 
0.010000 2.58 ( 2, 1) ( 2, 2) (0, 8) 
0.005000 2.981 (2, 1) ( 2, 2) (0, 8) 
0.002000 3.09 ¢ 1, a) ( 0, 8) 
0.001000 3.29 (1, 1) ( 0, 8) 
0.000100 3.72 (1, 1) (0, 8) 
0.000010 4.26 (1, 1) G 205, S} 
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Table C.7  Studentized statistics for the corner method and selected ARMA orders, on the S&P 
500 returns. 


PRPRRP PR 
BWM R Ow 


15 


DIADORA wne 


ARMA (P,Q) 


P 


Coo O oo co oS G 


ROBA 


- 200000 
- 100000 
- 050000 
- 020000 
-010000 
- 005000 
- 002000 
- 001000 
- 000100 
- 000010 


-0. 
-0. 


pee E 
pd =1,3 
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Table C.8 Studentized statistics for the corner method and selected ARMA orders, on the 
squared S&P 500 returns. 
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Chapter 6 


6.1 


6.2 


6.3 
6.4 


6.5 


From the observations €1,..., €n, We can compute =n! X e and 
: Poth). R I&A a ana so 
aah) = D Poth) = pelh) =-=} E- Nea E), 


t=1+k 


for h = 0, ...,q. We then put 
a1 = pal) 


and then, for k = 2, ...,q (when q > 1), 


r k=1 a : 
_ berbk) — Xizi Berk = i)dk-i,i 
= k=l & ps 

1— el D2) OK-1,i 
Oki = Oki, — EO -1ki, E=1,...,k—-1. 


Ok,k 
With standard notation, the OLS estimators are then 


The assumption that X has full column rank implies that X’X is invertible. Denoting by (., -) 
the scalar product associated with the Euclidean norm, we have 


(Y — xô, XG, — 6)) = Y' |x — x (xX) | x’x] @, — 0) =0 
and 


nQ, (0) = IY — X61? 
= |Y — XO, ||? + |X Â, — 0)? +2(¥ — X6,, XO, — 0)) 
> |Y — XO, ||” = n Qn (ôn), 


with equality if and only if 0 = 6, and we are done. 


We can take n = 2, q = 1, €o = 0, €; = 1, € = 0. The calculation yields (ô, âY = (1, —1). 
Case 3 is not possible, otherwise we would have 


e < e? —O— tye? | — hoe? 5 


for all t, and consequently ||¥ ||? < ||Y — X6,||°, which is not possible. 

Using the data, we obtain ô= (1, —1, —1/2), and thus ĝe Æ ô. Therefore, the constrained 
estimate must coincide with one of the following three constrained estimates: that constrained 
by a2 = 0, that constrained by a; = 0, or that constrained by a1 = a2 = 0. The estimate con- 
strained by a2 = 0 is 6 = (7/12, —1/2, 0), and thus does not suit. The estimate constrained 
by a, = 0 yields the desired estimate 6° = (1/4, 0, 1/4). 


First note that €? > won?. Thus e? = 0 if and only if n, = 0. The nullity of the ith column 
of X, for i > 1, implies that nn—i+1 = --- = n2 = nı = 0. The probability of this event tends 
to 0 as n — œ because, since En? = 1, we have P(n, = 0) < 1. 
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6.6 Introducing an initial value Xo, the OLS estimator of ¢o is 


1 n = 1 n 

N 2 

=|- y X — y XXi; 

Pn (: 2.) P tAt—1 
t=1 a 

and this estimator satisfies 


n 


=l n 
x 1 1 
Jahn = go) = (: > 1) Va 3 - EX1. 


t=1 


Under the assumptions of the exercise, the ergodic theorem entails the almost sure conver- 
gence 


1 n 3 j 1 n j 
2 Xii > EX, = 2 X,X,_-1 > EX,X;_1 = OEX?, 
t= t= 


and thus the almost sure convergence of @, to ġo. For the consistency the assumption E é < 
oo suffices. 

If Eet < œ, the sequence (€,X;_1, F+) is a stationary and ergodic square integrable 
martingale difference, with variance 


Var(eX:-1) = E (0? X?_). 


We can see that this expectation exists by expanding the product 


q ic) 2 
ai = (w E) (Zoe) 
i=0 


i=1 


The CLT of Corollary A.1 then implies that 


1 n £ 
a J eX + NO, Elo? X7_1)), 
t=1 


and thus 
A L£ = 
Vn(on — $0) + N{0, (EX?)~*E(67X?_\)}. 
When oa? = wo, the condition E é < oo suffices for asymptotic normality. 
6.7 By direct verification, A~!A = J. 
6.8 1. Let č = €,/,/@o. Then (č) solves the model 


4 1/2 
= (: + Senet} Th 


The parameter wo vanishing in this equation, the moments of č do not depend on it. It 
follows that Ee?” = E (Joë) = Kaj). 


2. and 3. Write M = M (wh) to indicate that a matrix M is proportional to oh. Partition the 


vector Z, = (1, €? j,- ea) into Z,;; = (1, W;—1) and, accordingly, the matrices 


6.9 1. 
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A and B of Theorem 6.2. Using the previous question and the notation of Exercise 6.7, 
we obtain 
Ai =An(), An = AS; = Air(wo), An = An (o8). 


We then have 
ac! l Ay + Ay AnFAnAR (A) -AÑ AnF o) | 
—F Az Aj; (œ) Flog) 


Similarly, 
Bii = Bii (œ), B2 = Bh, = B2(@9), Ba = Bro (w9). 


It follows that C = A~!BA7 is of the form 
g l Culo) ~— Ci2(wo) l 
Cai (@o) Ca (1) l 


Let a = infyec ||x — y||. Let us show the existence of x*. Let (xn) be a sequence of 
elements of C such that, for all n > 0, ||x — Xn |? <a? +1 /n. Using the median equality 
la + bl? + lla — bl? = 2llal? + 2||b||*, we have 


2 2 
Xn — xm ll" = len — x — Om — x)| 


= 2llxn — x|? + Qllxm — xl? — lan — x + Gm — XIN 
= 2|xn — x|? + llaim — xl? — 4llx — Gm + Xn)/2117 
< 2(a? + 1/n) + 2(a? + 1/m) — 4a? = 2(1/n + 1/m), 


the last inequality being justified by the fact that (xm + x,)/2 € C, the convexity of C 
and the definition of a. It follows that (x,,) is a Cauchy sequence and, E being a Hilbert 
space and therefore a complete metric space, x, converges to some point x*. Since C is 
closed, x* € C and ||x — x*|| > a. We have also ||x — x*|| < a, taking the limit on both 
sides of the inequality which defines the sequence (x,). It follows that ||x — x*|| =a, 
which shows the existence. 

Assume that there exist two solutions of the minimization problem in C, xT and ee 
Using the convexity of C, it is then easy to see that (xf + x})/2 satisfies 


lx — Gop + x3)/2I] = lx — xP |] = lle — x3. 


This is possible only if xf = x} (once again using the median equality). 


. Let à € (0, 1) and y € C. Since C is convex, (1 — A)x* + ày € C. Thus 


lx —x* |? < jx {0 —A)x* + Ay}? = [x — x* + AG* — yl? 


and, dividing by A, 


Ax” — yll? = 20" — x, x* = y) > 0. 


Taking the limit as à tends to 0, we obtain inequality (6.17). 
Let z such that, for all y € C, (z — x, z — y) < 0. We have 


lx — zll = (z — x, (z — y) + O — x)) 
s {z=} =x) < la —zilllx — yl, 


the last inequality being simply the Cauchy—Schwarz inequality. It follows that ||x — z|| < 
|x — yll, Vy € C. This property characterizing x* in view of part 1, it follows that z = x*. 
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6.10 


6.11 


6.12 


6.13 
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1. It suffices to show that when C = K, (6.17) is equivalent to (6.18). Since 0 € K, taking 
y = 0 in (6.17) we obtain (x — x*, x*) < 0. Since x* € K and K is a cone, 2x* € K. For 
y = 2x* in (6.17) we obtain (x — x*, x*) > 0, and it follows that (x — x*, x*) = 0. The 
second equation of (6.18) then follows directly from (6.17). The converse, (6.18) => 
(6.17), is trivial. 


2. Since x* € K, then z = àx* € K for A > 0. By (6.18), we have 


(Ax—z,z) = A*(x—x*,x*) =0, 

(Ax —Z,y) = A(x—x*,y) <0, VyeEK. 
It follows that (Ax)* = z and (a) is shown. The properties (b) are obvious, expanding 
||x* + (x — x*)||? and using the first equation of (6.18). 


The model is written as Y= XO + x¥@g@ +U. Thus, since M>X® =0, we have 
MY = My) X06 + MU. Note that this is a linear model, of parameter 6“). Noting that 
M; Mz = Mh, since M2 is an orthogonal projection matrix, the form of the estimator follows. 


Since J, is symmetric, there exists a diagonal matrix D, and an orthonormal matrix P, 
such that J, = P, DnP}. For n large enough, the eigenvalues of J, are positive since J = 
liM, Jn is positive definite. Let à, be the smallest eigenvalue of J,,. Denoting by || - || the 
Euclidean norm, we have 


Xi) In Xn = Xi, Pa Dn P Xn = Xn Pa P Xa = AADC AP: 


Since limy—o0 X’, Jn Xn = 0 and limpo An > 0, it follows that limy— oo || Xn|| = 0, and thus 
that X„ converges to the zero vector of RÉ. 


Applying the method of Section 6.3.2, we obtain X“ = (1, 1)’ and thus, by Theorem 6.8, 
6° = 2, 0)’. 


Chapter 7 


7.1 


1. When j < 0, all the variables involved in the expectation, except €;— j, belong to the o-field 
gnerated by {€;—j-1, €;—j—-2, -.. }. We conclude by taking the expectation conditionally on 
the previous o-field and using the martingale increment property. 


2. For 7 >0, we note that E is a measurable function of n?, bes n j+ and of 
ef js Ep jeji Thus E GAE si léi €j- ++) is an even function of the 
conditioning variables, denoted by heij» e jr ee) 


3. It follows that the expectation involved in the property can be written as 


E {E (h jot j, awe +) jet jf (ej 1, €t—j—25-+ s) 
| €:—j-1, @—-j-2,---)} 
= BY f teot eb pte Oj En jad nodro) 
= 0. 


The latter equality follows from of the nullity of the integral, because the distribution of 
n: is Symmetric. 
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7.2 By the Borel—Cantelli lemma, it suffices to show that for all real 5 > 0, the series of general 
terms P(p'e? > 6) converges. That is to say, 


lee) co 
E t-2\s E 2s 
t=0 t=0 ô" a — p)" 


using Markov’s inequality, strict stationarity and the existence of a moment of order s > 0 
for €?. 


7.3 For all « > 0, the process (X/) is ergodic and admits an expectation. This expectation is finite 
since Xf < « and (Xý) = X; . We thus have, by the standard ergodic theorem, 


1 n 1 n 

- X2- X; > E(XÍ), as. i 

DD 2 ı > E(Xi), as. asn — ow 
t=1 t=1 

When « — ov, the variable Xf increases to X;. Thus by Beppo Levi’s theorem E(X7) 

converges to E(X1) = +00. It follows that n~! yoy X, tends almost surely to infinity. 


7.4 1. The assumptions made on f and © guarantee that Y, = {infọco X;(@)} is a measurable 
function of 7;, n+-1,.... By Theorem A.1, it follows that (Y;) is stationary and ergodic. 


2. If we remove condition (7.94), the property may not be satisfied. For example, let © = 
{0;, 02} and assume that the sequence (X;(1), X+(@2)) is iid, with zero mean, each compo- 
nent being of variance | and the covariance between the two components being different 
when ¢ is even and when ż is odd. Each of the two processes (X;(61)) and (X;(62)) is sta- 
tionary and ergodic (as iid processes). However, Y, = infy(X;(@)) = min(X; (01), X;(02)) 
is not stationary in general because its distribution depends on the parity of t. 


7.5 1. In view of (7.30) and of the second part of assumption Al, we have 


sup |Qn(0) — On(6)| 


S) 
n 


= supa! >= {07 + 6? — 07)(0? — 67) — 26? (0? — 67)} ©) 
E 


t=1 


< sup Kn! $` {Q0} + Kp')p' +2e p"} > 0 (C.7) 


S) J= 


almost surely. Indeed, on a set of probability 1, we have for all ı >Q, 


lim sup sup Kn”! > {207 + Kp) + 2e? p'} (C.8) 


n>co 0€0 t=1 


n 

: = 2 

<climsupn”! J [spoel 
noo ja] (EO 


=i [ea sup of + Eno?) : 
deO 


Note that Ee? < oo and (7.29) entail that Eg, suppe@ 02 (0) < oo. The limit superior (C.8) 
being less than any positive number, it is null. 


2. Note that v, := e? — of (80) = €? — Ew (€7ler-1,..-) is the strong innovation of €}. We 


thus have orthogonality between v, and any integrable variable which is measurable with 
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respect to the o-field generated by {e,, u < t}. It follows that the asymptotic criterion is 
minimized at 60: 


lim Qn (0) = Em f€? — 0? (00) + 07 (00) — 07} 


= lim Qn (60) + Evy {07 Go) — 07 (6)} + 2E [vi {07 00) - 07)}] 


2 F 
= lim Qn(o) + Eo f0; 0) — 9; @)}" = lim Qn (60). 
with equality if and only if o; 20) = o? (60) P4 -almost surely, that is, © = 6o (by assump- 
tions A3 and A4; see the proof of Theorem 7.1). 


3. We conclude that Ô, is strongly consistent, as in (d) in the proof of Theorem 7.1, using 
a compactness argument and applying the ergodic theorem to show that, at any point 61, 
there exists a neighborhood V (61) of 6; such that 


if cO, 644, liminf inf Q(0)> lim Q(@%) as. 
n>co 6EV() n—>co 
4. Since all we have done remains valid when © is replaced by any smaller compact set 
containing 69, for instance ©°, the estimator 6° is strongly consistent. 
7.6 We know that ô, minimizes, over ©, 
n 2 
t 


1,(0) = n7! X — + logé?. 


t=1 


For all c>0, there exists 6* such that 67(6*) = cé7(6,) for all t > 0. Note that 6* + 
6, if and only if c #1. For instance, for a GARCH(1, 1) model, if 6, = (ô, âi, Bi) we 
have 6* = (cô, câ], ĝi). Let f(c) = 1, (6*). The minimum of f is obtained at the unique 


point 
=] 
Pa õi 26, ) 


For this value c, we have ĝ* = 6,. It follows that c = 1 with probability 1, which proves the 
result. 


7.7 The expression for J; is a trivial consequence of (7.74) and Cov(1 — n2, nt) = 0. Similarly, 
the form of J) directly follows from (7.38). Now consider the nondiagonal blocks. Using 
(7.38) and (7.74), we obtain 


nE dl, de; 


1 3o? do? ) 
06; 00; i 


E —172)°E,, | L — 
Tew) = (1 — 7?) wae 30; Yo 


In view of (7.41), (7.42), (7.79) and (7.24), we have 


1 3o? ðo? 
a | 
o 0w dV; 


CO 
k 0€t—ky-i 
XO BFA, DBEA, bY 2a Ew {or Erk- Teie] =o, 


ky,ko=0 
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1 bop ðo? 
© | of dain O09; 


c dEr ami 
a Bọ (1, 1) By ad, DY 2a En {or te? kı—ioft-k2— | =0 


v; 
ky ,ko=0 ð J 


and 


oo kı 
= 5 {>> a6 ae gla- ‘La. DBEA, D 
0 


ky.ko=0 (¢=1 i=l 


q 
“ 
—4 2 t—ko-i 
fo (0+) awa) €t—ky-i (go) 


i'=1 


It follows that 


n 1 ðo? ðo? 
Vi, j, Eg of 00; a0, (go) ¢ =0 (C.9) 


and Z is block-diagonal. It is easy to see that J has the form given in the theorem. The 
expressions for J; and Jz follow directly from (7.39) and (7.75). The block-diagonal form 
follows from (7.76) and (C.9). 


7.8 1. We have e, = X; —aX;_-1, o? =1+ ae? . The parameters to be estimated are a and a, 
w being known. We have 


ðe ðo? ðo? 
= —X;-1, ae = —2a€,_1X;_2, EFA = Ei 
32 of ; oa a? o? 
ða? = 20X72, dada POG IAs da? G 
It follows that 
aL 2ag€;—-1 X—2 2X1 
O a a > 
oj Ot 


ae; nE 
— = 1 = hy 
Ja (po) ( n7) Ge 


t 


a! et 
=z (Yo) = ; 
of 
aL 8ao€1-1X:-1X;— Qa€;—1 X12) 
2 go) = -n DEALA =: 0€r-1Xr-2) 
OF Or 
20X? 2X2 
+a- =L, 
Or Or 
34 5 2a€; X12 Qage?_ X12 Jez X1 
(go) = pH dy) gp SS. 


4 E 3 
dada O; Ot oO; 
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Letting T = (Zjj), J= (Jij) and u3 = En}, we then obtain 


ae, 2 
Tii = Ego yg œ 


ae; z ef] 
TIn = Ep g (70) = (Ky = 1)Eo of ; 


2. In the case where the distribution of 7, is symmetric we have u3 = 0 and, using (7.24), 
Ti2 = Si2 = 0. It follows that 


The asymptotic variance of the ARCH parameter estimator is thus equal to 
(ky — 1){ Ea, (et, Jof): it does not depend on ag and is the same as that of the QMLE 
of a pure ARCH(1) (using computations similar to those used to obtain (7.1.2)). 


3. When a = 0, we have of = 1, and thus EX? = 1/(1 — aĝ). It follows that 


a -2u J Ea "I 


2u} (ky — Dy Ky 


Z=- ( „17%, — u3 (1 — aĝ) /kKn ). 
—p3(1 aa aĝ) /Kn (Ky = 1)/kn 


We note that the estimation of too complicated a model (since the true process is AR(1) 
without ARCH effect) does not entail any asymptotic loss of accuracy for the estimation 
of the parameter ao: the asymptotic variance of the estimator is the same, | — a), as if 
the AR(1) model were directly estimated. This calculation also allows us to verify the 
‘ag = 0’ column in Table 7.3: for the M(0, 1) law we have u3 = 0 and Ky = 3; for the 


normalized x7(1) distribution we find u3 = 4/ J/2 and Ky = 15. 
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7.9 Let € >0 and V(69) be such that (7.95) is satisfied. Since ô, — 6 a.s., for n large enough 
6, € V(@) a.s. We thus have almost surely 


1 n 7 Í n 1 n 7 
| — In) =F] <= D7 4:0) — J] + = DM Gn) -o 
t=1 t= t=1 
1 n 1 n 
-$ I- J] += >) sup 14) — J @o)Il- 
eS n i 2EV Oo) 
It follows that 
ioll- j Jĝ) — J| < 
Ja m 2 t( n) a Ze 


and, since £ can be chosen arbitrarily small, we have the desired result. 

In order to give an example where (7.95) is not satisfied, let us consider the autoregressive 
model X; = 0)X;—1 +; where 0) = 1 and (n+) is an iid sequence with mean 0 and variance 
1. Let J, (0) = X, — 0 X:—1. Then J; (00) = n, and the first convergence of the exercise holds 
true, with J = 0. Moreover, for all neighborhoods of 69, 


n 


1 1 n 
-X sup [40 -I= (= DO 1X-il) sup 10- | > +o, a.s., 
n i PEV 6) MZ BEV (0) 


because the sum in brackets converges to +00, X, being a random walk and the supremum 
being strictly positive. Thus (7.95) is not satisfied. Nevertheless, we have 


1 > Š > 
=D rO = = Yr = On Xi1) 


t=1 t=1 


1 n j 1 n 
=- 2 m + Van — Dz 2A 


— J =0, in probability. 


Indeed, n~3/2 Xa X;—1 converges in law to a nondegenerate random variable (see, for 
instance, Hamilton, 1994, p. 406) whereas „y/n (6, —1) > 0 in probability since n(6, — 1) 
has a nondegenerate limit distribution. 


7.10 It suffices to show that J7! — 499 is positive semi-definite. Note that 64 (807 (60)/ 00) = 
a? (60). It follows that 


1 d02(6o) 


bJ = E(Z;), where Z, = ——— 
ee ‘a7 (6) 30 


Therefore J — J 6006 J = Var(Z;) is positive semi-definite. Thus 
yI (JT! — 0004) Jy = y' (J — JO0% Jy = 0, Yy e RT, 
Setting x = Jy, we then have 
x' (JT! — 090)x >20, — Vx eR, 


which proves the result. 
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7.11 1. In the ARCH case, we have 6 (807 (60) /90) = =o; 2(A). It follows that 


zÍ 1 407(6) (1- 64 P) o 
o2 (0) 90! 07 (@) 30 


or equivalently Q’ = 64J, that is, J! = 0). We also have 6)/6 = 1, and thus 
1= 3% = A's FI = OF Q, 


2. Introducing the polynomial Bg(z) = 1 — i B;z/, the derivatives of o? (0) satisfy 


Bo(L) (0) =i 


Bo o =é; i=1,...,q 
OQ; 


2 


BDO) = oF ere 


It follows that 


0070) = 
a6! 


Bo(L) w+ ue e2; = By(L)o? (6). 


isi 


In view of assumption A2 and Corollary 2.2, the roots of Bg(L) are outside the unit disk, 
and the relation follows. 


3. It suffices to replace 69 by Bo in 1. 


7.12 Only three cases have to be considered, the other ones being obtained by symmetry. If 
ti < min{ty, t3, t4}, the result is obtained from (7.24) with g = 1 and t — j = tı. If b = t3 < 
ti < t4, the result is obtained from (7.24) with gle.. Laa ES E ,t =h adt- j =t. Iftn = 
t3 = t4 < tı, the result is obtained from (7.24) with gle? ; n) E J=0,h=h=t=t 
and f (erji, =) = &,- 


7.13 1. It suffices to apply (7.38), and then to apply Corollary 2.1 on page 26. 


2. The result follows from the Lindeberg central limit theorem of Theorem A.3 on page 345. 


3. Using (7.39) and the convergence of e] to +00, 


2 
1 1 
— ay a 2n Pest > — as. 
#2298 (ao) = 23 (nir -) a 


4. In view of (7.50) and the fact that 3707(a)/da? = 3?0? (a) /ðæ? = 0, we have 


3 
pQ eim ea 
1+ae?] l+ae?, 


fe+s(t+ a 


3 


f] 
= 2 L(a) 


IA 
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5. The derivative of the criterion is equal to zero at &. A Taylor expansion of this derivative 
around qa then yields 


n 


1a 1 a? 
0= Va x Bo to) T a >D gat COV; — a) 
t=1 


t=1 


IQ @& JVn(a — a9)? 
= f(g") 
Ta 2 da3 (a) 2 


where a* is between && and a. The result easily follows from the previous questions. 


6. When wo Æ 1, we have 


ð e? E 
— 4 (a0) = | 1 - —+ — 
da 1+ aoe7_, J 1+aoer_, 


with 3 
= Bo o 
(1 + aoe?_,)? í 


Since d; —> 0 a.s. as t —> oo, the convergence in law of part 2 always holds true. More- 


over, 
2 
a? e 
B (an) — | g2——* E 1) | — 
ða? +(20) ( L+aoe? } ) (; + age? } 
2 
= on —»(—E1-) +4 
: 1 + goe? ] k 
with 


w- (21 y 
d* = 2—2 (=) =o(1) a.s., 


(1 + aoe?) \1 +ae?_, 


which implies that the result obtained in part 3 does not change. The same is true for part 


4 because 
2 \,2 2 3 
_ z — g t ee Eai 
~ 1+ ae? , 1+ ae? , 


do ta 
[2+6 (o+ a)n F 


Finally, it is easy to see that the asymptotic behavior of (wo) is the same as that of 
a‘ (œw), regardless of the value that is fixed for œw. 


33 
uw 


IA 


7. In practice wg is not known and must be estimated. However, it is certainly impossible to 
estimate the whole parameter (wo, œo) without the strict stationarity assumption. Moreover, 
under (7.14), the ARCH(1) model generates explosive trajectories which do not look like 
typical trajectories of financial returns. 
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Consider a constant œ € (0, ag). We begin by showing that & (1) > a for n large enough. 
Note that 
a , 1 n 
âd) =arg min Qn(@), Qn(a) =-=) > {Er(a) — £(a0)}. 
ae[0,co) n aA 
We have 


— 1 > p of (a0) =: o? (a) 
owie | a 1} + iog 


= P (a) o? (ao) 
_ 1 ” K (ao aye? 1+ aer 
nS 1e ey 1+ aoe? ; 


In view of the inequality x > 1 + log x for all x > 0, it follows that 


2 
(œo — ayer) 


2 
1+ QE; _ 4 


1 
inf QO, (a) > log — Poti 1. 
inf Qn («) = log = 9 n + log + 


t=1 


For all M > 0, there exists an integer ty such that e > M for all t > ty. This entails that 


eee (ao — a)M 
liminf inf Q, (a) > log ————— 
n> a<a 1 + oM 
Since M is arbitrarily large, 
do~ a 


liminf inf Q, (œ) > log 


n>% a<@ ao 


+1>0 (C.10) 


provided that æ < (1 — e~!)ap. If œ is chosen so that the constraint is satisfied, the inequal- 
ities 
lim sup On (Gn) < lim sup On (ao) =0 


nC noo 


and (C.10) show that 


lim â >a as. (C.11) 
n> 


We will define a criterion O, asymptotically equivalent to the criterion Q,,. Since éi > 


OO a.s. as t > œ, we have for a Æ 0, 
lim Q,(a@) = lim O,(a@), 
n->oo noo 


where 


lv (ag — a) a 
On(a) = — Sna + log —. 
n a ao 


On the other hand, we have 


š ao a 
lim O,(@) = — — 1 + log — > 0 
noo a ay 
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when a/a Æ 1. We will now show that Q,(a@) — O,(a) converges to zero uniformly in 
a € (a, 00). We have 


Rie Og py a 
n(@) — O, (a) = = n? ————_.—. + log ————_—_-. 
n I a(l + ae?) È TF veZa 


Thus for all M >Q and any e > 0, almost surely 


|Qn(a) — On(a@)| < +e) 


> 


la —ao|  |a — aol 
a? M aagM 


provided n is large enough. In addition to the previous constraints, assume that œ < 1. 
We have |a — ag|/a2M = ay /a? M for any œ < œ < ap, and 


for any œ > ag. We then have 


Í 1 + g&o 1 +o 
I n = On < 1 . 
ole (@) (@)| < A +e) oM anM 


Since M can be chosen arbitrarily large and € arbitrarily small, we have almost surely 


lim sup |Q, (œ) — O,(a@)| = 0. (C.12) 


u ey oy 


For the last step of the proof, let ag and ag be two constants such that a) < ao < ag It 
can always be assumed that œ < a . With the notation ô? =n! 4 n? , the solution of 


až = arg min On (a) 


is œ% = a6, This solution belongs to the interval (a , ag ) when n is large enough. In 
this case 


*k 


Qn 


arg min O,(a@) 
agag ag) 


is one of the two extremities of the interval (a , ag ), and thus 


lim O,(a}*) = min Í lim O, (ai), lim On(ois)} > 0. 
n—-oco n> n—>oo 


This result, (C.12), the fact that ming Qn (œ) < Qn(ao) = O and (C.11) show that 


lim arg min Q, (œ) € (a , a). 
noo a>0 


Since (a , ag ) is an arbitrarily small interval that contains a and @, = arg ming Qn (œ), 
the conclusion follows. 


. It can be seen that the constant 1 does not play any particular role and can be replaced 
by any other positive number w. However, we cannot conclude that @, —> ap a.s. because 
&, = âS (Ôn), but Ô, is not a constant. In contrast, it can be shown that under the strict 
stationarity condition a < exp{— E (log n?)} the constrained estimator @(w) does not 
converge to œo when w Æ ap. 
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Chapter 8 


8.1 


8.2 


Let the Lagrange multiplier A € RP”. We have to maximize the Lagrangian 
L(x, A) = (x — xo) J (x — xo) +A' Kx. 
Since at the optimum 


S ƏL(x, À) 


0= =2Jx—2Jxo+ KA, 
ox 


the solution is such that x = x9 — 4J~!K’A. Since 0 = Kx = Kx — 4K J~'K’A, we obtain 
A= (KJK K xo, and then the solution is 


x = xo — JK (KJI K'Y" Kxo. 


Let K be the p x n matrix such that K(1, i1) =--- = K(p,ip)= 1 and whose the other 
elements are 0. Using Exercise 8.1, the solution has the form 
x= [In— KKI KEY E| (C.13) 
Instead of the Lagrange multiplier method, a direct substitution method can also be used. 
The constraints x; =- = Xi, = 0 can be written as 
x = Hx* 


where H isn x (n — p), of full column rank, and x* is (n — p) x 1 (the vector of the nonzero 
1 


components of x). For instance: (i) if n = 3, x2 = x3 = 0 then x* = x; andH=] 0 |; 
0 


(ii) if n = 3, x3 = 0 then x* = (x1, x2) and H = 


oor 
oro 


If we denote by Col(H) the space generated by the columns of H, we thus have to find 


min ||x — xolly 
xeCol(H) 


where ||.|| is the norm ||z||y = Jz’ Jz. 

This norm defines the scalar product (z, y); = z' Jy. The solution is thus the orthogonal 
(with respect to this scalar product) projection of x9 on Col(H). The matrix of such a 
projection is 

P=H(A'JH)'H’J. 


Indeed, we have P? = P, PHz = Hz, thus Col(H) is P-invariant, and (Hy, (I — P)z); = 
y'H'J(I — P)z = y'H'Jz — y'H'J H(H'J H)! H'Jz = 0, thus z— Pz is orthogonal to 
Col(H). 

It follows that the solution is 


x = Pxo = H(H'J HY 'H' J xo. (C.14) 
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This last expression seems preferable to (C.13) because it only requires the inversion of 


the matrix H’JH of size n — p, whereas in (C.14) the inverse of J, which is of size n, is 
required. 


8.3 In case (a), we have 
1 0 
K = (0,0, 1) and H=]|0 1 |; 
0 0 


and then 


jive f 2. St". 7-28: 1s 
aya =( 2, T aa. aa 


2/3 1/3 0 
H(H'JHY'H' =| 1/3 2/3 0 
0 0 O 
and, using (C.14), 
1 0 1/3 
P=H(A'JH)'H'J=| 0 1 2/3 
0 0 0 
which gives a constrained minimum at 
Xo1 + X03/3 
x= | X02 + X03/3 
0 
In case (b), we have 
0 1.0 1 
K= and H=j] 0 j; (C.15) 
0 0 1 0 


and, using (C.14), a calculation, which is simpler than the previous one (we do not have to 
invert any matrix since H’JH is scalar), shows that the constrained minimum is at 


xol — Xo2/2 
0 
0 


= 
II 


The same results can be obtained with formula (C.13), but the computations are longer, 
in particular because we have to compute 


3/4 1/2 =1/4 
J= 1/2 1 -1/2 |}. (C.16) 
—1/4 -1/2 3/4 
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8.4 Matrix JT! is given by (C.16). With the matrix Kı = K defined by (C.15), and denoting by 
K> and K3 the first and second rows of K, we then obtain 


f 1 -1/2 0 
B-J 'K' (KJK) K=|0 0 Of, 
0 0 0 
f 1 -1/2 0 
Ih —J7'K3(KoJ7'K3) Ko=]| 0 0 0 |, 
0 1/72 #1 
E 1 0 1/3 
B — JK} (K3J'K}) K3=| 0 1 2/3 
0 0 0 
It follows that the solution will be found among (a) A = Z, 
Zı — Z2/2 Zı — Z2/2 Zı + Z3/3 
(b) à = 0 A= 0 , (dà = | Z.+2Z3/3 
0 Z3 + Z2/2 (0) 


The value of Q(A) is 0 in case (a), 3Z2/2 + 2Z2Z3 + 2Z} in case (b), Z3 in case (c) and 


Z$ /3 in case (d). 


To find the solution of the constrained minimization problem, it thus suffices to take the 
value A which minimizes Q (à) among the subset of the four vectors defined in (a)—(d) which 
satisfy the positivity constraints of the two last components. 

We thus find the minimum at A“ = Z = (—2, 1, 2)’ in case (i), at 


—3/2 
w= 0 
3/2 


=5/2 


and 


—2 

in case (ii) where Z = —1 ‘ 
2 
—2 

in case (iii) where Z = 1 
—2 
—2 

in case (iv) where Z = —1 
—2 


8.5 Recall that for a variable Z ~ M(0, 1), we have EZ+ = —EZ~ = (27 )~!/ and Var(Z+) = 


Var(Z~) = $(1 — 1/7). We have 


Zı (Ky + las —w —0 
Z=| Z7 |~N{0, E= (k -DJ = wo 1 0 
Z3 — w0 0 1 
It follows that 
B= 


Habe, 
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The coefficient of the regression of Z; on Z2 is —wo. The components of the vector (Z; + 
@oZ2, Z2) are thus uncorrelated and, this vector being Gaussian, they are independent. In 
particular E(Z, + @)Z2)Z, = 0, which gives Cov(Z), Z3) = EZZ} = —woE(Z2Z,) = 
—wy E(Z;)* = —wo/2. We thus have 


= = 2 2 1 2 1 2 
Var(Zı + woZ, + Z3 ) = (ky + Dog +| 1- = 209 = | Ky — — | @- 
£ T 


T 
Finally, 
1 1\ [25T -o -o 
Var(A*) = (1 = =) —a 1 0 
= 0 1 
It can be seen that 
1 1 20}, =o) — w0) 
Var(Z) — Var(A*) = 5 (: + 1) —w 1 0 
g =y 0 1 


is a positive semi-definite matrix. 


8.6 At the point 4) = (œo, 0, ..., 0), we have 


da; (80) 


96 = (1, won? is +- oona) 


and the information matrix (written for simplicity in the ARCH(3) case) is equal to 


J (@) = Ea ( Jae) me) 
of (o) 30 00 

1 wo wo wo 

1 Mo OK n ws, w 

Shl o g a ok 

wo og wp WK 


This matrix is invertible (which is not the case for a general GARCH(p, q)). We finally 
obtain 


(kn +q- 1a? -—O +++ -—@ 
=w 


(Oo) = (ky — 1)J (00)! = 


=O 


8.7 We have o? = w + ae? |, 007/00" = (1, e? |) and 


t-1? 


1 3o? ðo? 1 1 e2 1 1 o% 
marion ld, Eaa dh) 
o, 90 06 or (60) \ E1 Eri wg \ 20 Ky 


and thus 
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In view of Theorem 8.1 and (8.15), the asymptotic distribution of Jn (6 — 0o) is that of the 
vector A“ defined by 


A Z| = —Wo _ Zi +Z, 
À =( Z Z3 1 = zt ‘i 


We have EZ} = —EZy = (2n)~1/”, thus 


Since the components of the Gaussian vector (Z; + woZ2, Z2) are uncorrelated, they are 
independent, and it follows that 


Cov(Z1, Z} ) = —wpCov(Z2, Z, ) = —wo/2. 


We then obtain 


2 7 f 
-%03 (1 — =) zs) 

Let f(Z1, z2) be the density of Z, that is, the density of a centered normal with variance 
(kK, — 1)J =l Tt is easy to show that the distribution of Z; + @oZ, admits the density h(x) = 
Io” fe, z2)dz2 + ae f(x — w922, Z2)dz2 and to check that this density is asymmetric. 

A simple calculation yields E (z7) = —2//2n. From E(Z, + @9Z2) (zzy =0, 
we then obtain EZı(Z7)? =2œoo/vV2m. And from E(Zı +@oZ2)? (Z7) = E(Zi + 
wy Zz) E (Zz) we obtain EZ? Z} = ~œ (k + 1)/ V27. Finally, we obtain 


E(Zı + wZz)* = 3@9 EZ{Zz + 305 EZ) (Za +E (zy 


8.8 The statistic of the C test is M(0, 1) distributed under Hp. The p-value of C is thus 1 — 
® (n! X}; X;). Under the alternative, we have almost surely n~! $; _; X; ~ yno as 
n —> +00. It can be shown that log {1 — ®(x)} ~ —x?/2 in the neighborhood of +00. In 
Bahadur’s sense, the asymptotic slope of the C test is thus 


-2 —(/ne) 
n> Nn 2 


67, 6>0. 


The p-value of C* is 2(1 — ®(|n~'/? X; X;|). Since log2{1 — &(x)} ~ —x?/2 in the 
neighborhood of +00, the asymptotic slope of C* is also c*(0) = 67 for 6 >0. The C and 
C* tests having the same asymptotic slope, they cannot be distinguished by the Bahadur 
approach. 

We know that C is uniformly more powerful than C*. The local power of C is thus also 
greater than that of C* for all t >0. It is also true asymptotically as n — oo, even if the 
sample is not Gaussian. Indeed, under the local alternatives t/,/n, and for a regular statistical 
model, the statistic n~!/2 $; Xi is asymptotically M(t, 1) distributed. The local asymptotic 
power of C is thus y(t) = 1 — ®(c — t) with c = ®—!(1 — æ). The local asymptotic power 
of C* is y*(t) = 1 — ® (œ — t) + ® (—c* — t), with c* = 7! (1 — a /2). The difference 
between the two asymptotic powers is 


D(t) = y(t) — y* (t) = —® (c — t) + ® (œ — t) — © (—c* — T) 


8.9 
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and, denoting the M(0, 1) density by ¢(x), we have 


D'(t) =¢ (c = T) — 6 (c —t) +6 (= — t) = (te g(t), 


where ; 
g(t) =e ° 7 4 


eer n ( oe —c)t erie | . 
Since 0 < c < c*, we have 
2 toe “(ete 
g(t) =e? [=e =e — (c* 4 ee orgel <0. 


Thus, g(t) is decreasing on [0, oo). Note that g(0) > 0 and lim,.4.. g(t) = —oo. The sign 
of g(t), which is also the sign of D’(t), is positive when t € [0,a] and negative when 
T € [a, œ), for some a > 0. The function D thus increases on [0, a] and decreases on [a, co). 
Since D(O) = 0 and lim;-4+4.. D(t) = 0, we have D(t) >0 for all t > 0. This shows that, 
in Pitman’s sense, the test C is, as expected, locally more powerful than C* in the Gaussian 
case, and locally asymptotically more powerful than C* in a much more general framework. 


The Wald test uses the fact that 
Jn(Xn —0)~N(O,o7) and S? —> o? as. 


To justify the score test, we remark that the log-likelihood constrained by Hp is 


which gives Px /n as constrained estimator of o°. The derivative of the log-likelihood 


satisfies ij 
1 a A HANA 
> log L (0, 0°) = | —— 5,0 
ag a ee) — ) 
at (0,07) = (0, D X? /n). The first component of this score vector is asymptotically M(0, 1) 
distributed under Ho. The third test is of course the likelihood ratio test, because the uncon- 
strained log-likelihood at the optimum is equal to —(n/2) log S2 — (n/2) whereas the max- 
imal value of the constrained log-likelihood is —(n/2) log X` x /n — (n/2). Note also that 
Ln =nlog(1 + X,/S2) ~ W, under Ho. 
The asymptotic level of the three tests is of course a, but using the inequality iz < 
log(1 +x) < x for x >0, we have 


R, < Ln < Wn, 


with almost surely strict inequalities in finite samples, and also asymptotically under H;. This 
leads us to think that the Wald test will reject more often under H. 

Since 5 is invariant by translation of the X;, sS? tends almost surely to o both under 
Ho and under Hj, as well as under the local alternatives H,,(t) : 6 = t/,/n. The behavior of 
D X?/n under H, (T) is the same as that of X (X; + t/./n)?/n under Ho, and because 


Eha) SS TAa 
= ;+—=) =- > + — +2—X, > o° as. 
n ‘Mn nia "on Jn” 


under Ho, we have E X?/n — o? both under Ho and under H,,(t). Similarly, it can be 
shown that X,,/S, — 0 under Hp and under H,,(t). Using these two results and x/(1 + x) ~ 
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log(1 + x) in the neighborhood of 0, it can be seen that the statistics L,, R, and W, are 
equivalent under H,,(t). Therefore, the Pitman approach cannot distinguish the three tests. 
Using log P( x >x) ~ —x/2 for x in the neighborhood of +o0ọ, the asymptotic Bahadur 
slopes of the tests C1, C2 and C3 are respectively 
= 


fan Sa 2 
aut?) z p re log POG a Wa) = a S2 = oe 


p= pare e der H 
c20) = FE c3(0) = log m under i], 
Clearly 
0 
c2(0) = H < c3(8) = log {1 + cı (8)} < c1 (8). 


Thus the ranking of the tests, in increasing order of relative efficiency in the Bahadur sense, is 
score < likelihood ratio < Wald. 


All the foregoing remains valid for a regular non-Gaussian model. 


8.10 In Example 8.2, we saw that 


yı 
E(Z4Zi 
2^ =Z- Ze, eS : 7 pe i) 
' Var(Za) 


Yd 


Note that Var(Z,)e corresponds to the last column of VarZ = (xk, — 1)J -1 Thus ¢ is the 
last column of J~! divided by the (d, d)th element of this matrix. In view of Exercise 6.7, 
this element is (J22 — Jordy tia). It follows that Je = (0,...,0, J2 — Ja Ji Jay and 
eJe= Jz — Judy Jn. By (8.24), we thus have 


L=—l(z-y ty, | 24 Sdad d. 
= z ade oT 5 a 22 21411 12) 


1 = 
= 5 (Zi) O2 = Jn Ji! J2) = 


k,—1 (Zt)? 
2 Var(Za) 


This shows that the statistic 2/(«, — 1)L, has the same asymptotic distribution as the Wald 
statistic W,,, that is, the distribution 59/2 + x? /2 in the case dz = 1. 


8.11 Using (8.29) and Exercise 8.6, we have 


The result then follows from (8.30). 


8.12 Since XY = 0 almost surely, we have P(XY Æ 0) = 0. By independence, we have P(XY # 
0) = P(X £0 and Y 40) = P(X £0)P(Y + 0). It follows that P(X #40) = 0 or P(Y + 
0) =0. 
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Chapter 9 


9.1 Substituting y = x/o;, and then integrating by parts, we obtain 
| foo + f yf'Ody=1+ lim Df), — f fO)dy =0. 


Since o? and do? /00 belong to the o-field F,_; generated by {€„ : u < t}, and since the 
distribution of €, given F,—ı has the density or! f(-/c:), we have 


ð 
E £ — log L, F (0 
| 0 og Ln, ¢ (80) 


F} =0, 


and the result follows. We can also appeal to the general result that a score vector is centered. 


9.2 It suffices to use integration by parts. 


9.3 We have j ; 5 
(x = 0) (x = o) 0% — 9 (0 — 00) 
A(0, 6, x) = —————_. + —————_ = ~> ; 
(8, 8o, x) 20? i 202 202 Ta o? 
Thus 
aX +b i abo + b ao? a(O — 4) 
AO, 606%) -igt Pl upsa at 


when X ~ M(®o, 07), and 


( aX +b ) ~i( að +b ) ao? a(6 — 6) 
A(O, 6, X) Cor TN a6 —%) Er 


when X ~ MO, 07). Note that 
Eo(aX + b) = Eq (aX + b) + Cova, {aX + b, A, 0o, X)}, 


as in Le Cam’s third lemma. 


9.4 Recall that 
€ Er 


a 
gg ELA) = 2 à (:-|: 


and 


Ot 


ð Ji 
gy O8 Lnn O) = >R |; + ( _ 


t=1 


Using the ergodic theorem, the fact that (1 — In) is centered and independent of the past, 
as well as elementary calculations of derivatives and integrals, we obtain 


2 


_; 0 1 
=n! ap log Ln, p, (Oo) =n Ya 


A 1 ; 
To —— 0) to) > Tu 


2 


ð 
-n log Ln, 7,0) = nS iet log In| — 


A 1 
aa 2 £6) + 0( )> Jn, 


rea 
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and 
2 n 


12 1 1 2 a 
=n! =z log Ln, f, (O0) = n7 Ehtne (log n1) | om 


t=1 
almost surely. 
Jensen’s inequality entails that 
of (ne) 
fm) 


of (no) 
f) 


E logo f (no) — E log f(n) = 


< log E 


= log / oii 


where the inequality is strict if o f (no )/ f (ņ) is nonconstant. If this ratio of densities were 
almost surely constant, it would be almost surely equal to 1, and we would have 


Eln? = f Ix!" f(x)dx = / e T / Ixl" f(x)dx = Elnl’/o", 


which is possible only when o = 1. 


2 2 2 
It suffices to note that Te fy = "Fy. fy (= Ep; — 1). 


The second-order moment of the double ['(b, p) distribution is p(p + 1)/b*. Therefore, 
to have En? = 1, the density f of n, must be the double T (Vpr + 1), p). We then 


obtain x) 
+t s=p- Vv P(p + IIx]. 


Thus Îp =p and Te =1/p. We then show that x, := f x*f,(x)dx = (3 + p)(2 + 
p)/p(p + 1). It follows that t6 ,/t? p = (3 +2p)/(2 + 2p). 

To compare the ML and Laplace QML, it is necessary to normalize in such a 
way that E|7,| = 1, that is, to take the double F (p, , P) as density f. We then obtain 
pe we = p — p|x|. We always have lps = (= = p (En? — 1) = p, and we have 

= (E ne — 1) =1/p. It follows that te lth f = l, which was already known from 
ee 9.6. This allows us to construct a table similar to Table 9.5. 


Consider the first instrumental density of the table, namely 
h(x) = c|x|*~! exp(—Alx|"/r), à>0. 
Denoting by c any constant whose value can be ignored, we have 


(echo beret = Dive “Ge 
3 


À : 
gi(x,0) = —— ho" x1", 
o 


A 
820,0) = -5 —A(r— Do” |x|’, 
o 
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and thus j 5 
2 _ EQ=Imi)? Elm” -1 
Th, f = 72 = ) : 
Now consider the second density, 
h(x) = c|x|-*~| exp (—Alx|7"/r) ; 
We have 


g(x, 0o) =logo + (—A— 1) logo |x| — Ao” |x| Se 
F 


ZÀ =r=1 =r 
81(xX,0) = — +o "|x|", 
Oo 


—r—2 


Xr : 
ga, o) = L — A+ DoT 217, 
Oo 


which gives 


2 _ EA- In 


ae = 7 


Consider the last instrumental density, 


A(x) = c|x|~' exp {—A(log |x|)?} . 


We have 
g(x,o) = logo — loga|x| — à log? (a |x|) +é; 
À 
g(x, 0) = —2— log(o|z\), 
À À 
g2(x,0) = 25 log(o|x|) -24, 
o o? 
and thus 


Th, ¢ = E(log |n). 
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In each case, A f does not depend on the parameter A of h. We conclude that the estimators 


A 


n.n exhibit the same asymptotic behavior, regardless of the parameter À. It can even be easily 


shown that the estimators themselves do not depend on i. 


9.9 1. The Laplace QML estimator applied to a GARCH in standard form (as defined in 


Example 9.4) is an example of such an estimator. 


2. We have 
q P 
es 
07 (6*) = ao + >> oaie + >> Bojo7_; (6) 


i=l j=l 
-1 


P q 
= o? j= X Boj B! (a + Sane} = 0°07 (00). 
j=1 i=1 
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3. Since 
1 
ery 
2 P a : 
3o? (0) 
a =|1-) 6,3) a , 
j=l o,_,(0) 
07») 
we have 
do7(0*) 2A 3o? (60) 1 d07(6*) | 1 40760) 
aa °° 96 ° o2*) 00 °o2@) 90 


It follows that, using obvious notation, 


=f 2 
J(0*) = Np I(o) Ao =( oi) 0J) 


o*J21(80)  J22(00) 
9.10 After reparameterization, the result (9.27) applies with n, replaced by nf} = n;/o, and 6o by 
0* = (9709, 0’ a01, ---, 0° Q0q, Bor, -- ++ Bop)’: 
where 9 = f |x| f(x)dx. Thus, using Exercise 9.9, we obtain 
A = £ sbgai 
Va (bae = Ag'60) 5 N (0, 4r? paz! Iaz") 


with t7 ; = En? — 1 = En?/g? — 1. 


Chapter 10 


10.1 Note that o; is a measurable function of i es sie beg Hs and of Ais A 


Zr- -> that will 
be denoted by 


2 2 2 2 
Vlissa Nih Gn Etha +++) 


Using the independence between n,_, and the other variables of h, we have, for all h, 


Cov (or, E-r) = E {E (6¢€:—-h | Mai, «++ M—n4tds Gt—h-1s &—-h-2s +++ )} 


when the distribution P, is symmetric. 


10.2 A sequence X; of independent real random variables such that X; = 0 with probabil- 
ity 1/@ +1) and X; = (i + 1)/i with probability 1 — 1/(i + 1) is suitable, because Y := 
IIZ; Xi =O0as., EY =0, EX; =1 and [| EX; = 1. We have used P(lim, | An) = 
lim, | P(A,) for any decreasing sequence of events, in order to show that 


PY £0) =| [FPG #0) =o {Ye -= vol =0. 


i=1 i=l 


10.3 


10.4 


10.5 
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By definition, 


Į [£ expte) = lim J [E expia on 


i=1 i=l 


and, by continuity of the exponential, 


i=1 


Jim Į [E exp{lag 0h = opl im > log E exp{lài8 01) 


i=1 
is finite if and only if the series of general term log E exp{|A; g(n;)|} converges. Using the 
inequalities exp(EX) < E exp(X) < E exp(|X|), we obtain 


Ai Eg (m) < log gy (Ai) < log E exp{|Aig(7)I}- 


Since the A; tend to 0 at an exponential rate and E|g(n:)| < oo, the series of general term 
4; Eg(n;) converges absolutely, and we finally obtain 


[oe] CO CO 
D> flog gn (Ai)| < Doi Eg(n)l + Do log E exp{làig l}, 
i=l 


i=l i=l 
which is finite under condition (10.6). 


Note that (10.7) entails that 


n n 
2 A s * 92 r 
6 =e? ni Jim | [exp(aig(nra)} = e” ne exp [am Yasan] 


i=l i=l 


with probability 1. The integral of a positive measurable function being always defined in 
Rt U {+00}, using Beppo Levi’s theorem and then the independence of the n,, we obtain 


n 
Ee? < Ee® n? exp | im, t + rian 


i=1 


= fut [|E exp tigi, 
i=l 


which is of course finite under condition (10.6). Applying the dominated convergence 
theorem, and bounding the variables exp {J Aig(m-—i)} by the integrable variable 
exp {)-7°) lArg(m_—i)|}, we then obtain the desired expression for Ee?. 


Denoting by ¢ the density of n ~ M(0, 1), 
a hes a2 /2 7/2 
Eel =| ep (x)dx =2 f e*l p(x — A)dx = 2e* POA) 
0 0 


and E|n| = /2/z. With the notation t = |0| + |s|, it follows that 


2 jr? 
Ee 8l < elAslElnl pelti — exp | |Ac|,/—+ 2(|A|r). 
= T 2 


It then suffices to use the fact that ®(x) is equivalent to 1/2 + x¢ (0), and that log2®(x) 
is thus equivalent to 2x¢ (0), in a neighborhood of 0. 
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We can always assume ç = 1. In view of the discussion on page 249, the process X, = log e? 
satisfies an ARMA (1, 1) representation of the form 


Xı — BiX-1 = 0 + ur + bur, 
where u, is a white noise with variance oĉ. Using Var (log n?) = x?/2 and 


log 16 
Vm 


2 2 2 
Var{g (m) =1- = +8", Cov fg (n,), log nz} = 


the coefficients |b| < 1 and o? are such that 


og (1 +b?) = Var flog n? + aig (n1) — Bi logn?_,} 


2 (1+ B,? 2 log 16 
a Uae 2) iG tI z) 241p E 
2 T Vm 


and 


log16 x By 
V2 2 


When, for instance, œ = 1, 6) = 1/2, a, = 1/2 and 6 = —1/2, we obtain 


bo? = a,Cov fe), log ne} — Bı Var {log nr} =a 


X, — 0.5X;-1 = 1 + u; — 0.379685u;-1,  Var(u;) = 0.726849. 


In view of Exercise 3.5, an AR(1) process X; = aX;-1 + m, la| < 1, in which the noise 
n, has a strictly positive density over R, is geometrically 6-mixing. Under the station- 
arity conditions given in Theorem 10.1, if 7, has a density f >0 and if g (defined in 
(10.4)) is a continuously differentiable bijection (that is, if 6 ~ 0) then (log 07) is a geo- 
metrically -mixing stationary process. Reasoning as in step (iv) of the proof of Theorem 
3.4, it is shown that nZ and then (e€;), are also geometrically -mixing stationary 
processes. 


Since e} = o,n and e7 = on; , we have 


o = 0 +a) aM) =Ni = -n + Bi. 
If the volatility o; is a positive function of {ņ„, u < t} that possesses a moment of order 2, 
then 
{1 — Ea’ (n,)} Eo? =w% + 2wEo,Ea(y;) >0 


under conditions (10.10). Thus, condition (10.14) is necessarily satisfied. Conversely, under 
(10.14) the strict stationarity condition is satisfied because 


1 1 
Eloga(n) = 5E1080 (n) < 5 log Ea? (n,) < 0, 


and, as in the proof of Theorem 2.2, it is shown that the strictly stationary solution possesses 
a moment of order 2. 
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10.9 Assume the second-order stationarity condition (10.15). Let 


10.10 


10.11 


a(n) = o4,4n7 — on; + Bi, 
Hin = Eln = 2/2, 
Hyt = Ent = En; = 1/27, 
Ho, je, (h) = E (or|€r-nl). 


Me) = Ele = Hin Ho 


and 


ho = Eo, = — 25, 
1— Te 4 +a) Pi 


1 
Ma = Ea(n) = Jue ay) + Bi, 


1 2B 
Ha = 5+ tat) 4 Jar Ma +a,-) + pi, 
wo + 2@balle 
Lg = ——. 
l=- 


Using o; = w + a(m~1)07+-1, We obtain 


Ho, je (h) = Ohje) + Malls, jeh — 1), Wh = 2, 


1 2 
Ho, je (1) = OM )n| Ho + [zes + a;,-) + (2a lo2» 


2 
Ho, \e|(O) = ,/ — Ho2- 
x 


We then obtain the autocovariances 


Vielh) := Cov (ler, ern) = Mniko, jeh) — Hi 


and the autocorrelations pje(h) = Vje (h)/Vje (0). Note that yje (h) = Mayje\(h — 1) for all 
h> 1, which shows that (|e;|) is a weak ARMA(1, 1) process. In the standard GARCH 
case, the calculation of these autocorrelations would be much more complicated because o, 
is not a linear function of op—1. 


This is obvious because an APARCH(1, 1) with ô= 1, œi = &œı(l1 — çı) and aj = 
æ (l1 + çı) corresponds to a TGARCH(1, 1). 


This is an EGARCH(1,0) with ¢c=1, œi(0 +1)=@œ4,, a,(@—1)=—a_ and 
œ — œ E|n:| = œo. It is natural to impose œ+ > 0 and œ- > 0, so that the volatility 
increases with |ņ:—1|. It is also natural to impose a_ > a+ so that the effect of a negative 
shock is more important than the effect of a positive shock of the same magnitude. There 
always exists a strictly stationary solution 

€r = m exp(ao) {exp(æ+n 1) ln, 120} + exp(—o— n1) ln, 1 <0} } , 


and this solution possesses a moment of order 2 when 


E exp(æ+n) lm >00 <œ and Eexp(—a_n;) lin <0} < ©, 
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which is the case, in particular, for n, ~ M(0, 1). In the Gaussian case, we have 
Eo, =e {e8?o(a,) +e? o(a_)}, 
Ee? = Eo? = & [e @00,) + 2 &(20_) 
and 
Cov(o;, 1) = Eo;n:—1 E041 
= e" fel? (a4) + a4 Dla) = e7 fola) + a_O(a_)} | Eo, 


using the calculations of Exercise 10.5 and 
[e.e] . 2 2 [e.e] 2 2 
J xe p(adx = 2? f (y FAPO)dy = P (PA) HAPA). 
0 -À 


Since x œ> ġ(x)+x®(x) is an increasing function, provided a_ >a 4, we observe the 
leverage effect Cov(o;, €:-1) < 0. 


Chapter 11 


11.1 The number of parameters of the diagonal GARCH(p, q) model is 


m(m + 1) 
—z l +pt+q), 


that of the vectorial model is 


m(m + 1) 


m(m + 1) 
a) 


1 
{1 +(p+a) 5 
that of the CCC model is 
m(m—1 
m {1 + (p + q)m*} + an, 
that of the BEKK model is 


m(m + 1) 
2 


For p = q = 1 and K = 1 we obtain Table C.9. 
11.2 Assume (11.83) and define U(z) = By (2) Bg, (z). We have (11.84) because 


+ Km?(p +q). 


U (2) Ao (2) = Bo (2) Bj (2) Aa 2) = Bo (z)Bo(z)'Ag(z) = Ao (2) 


and 
U (2)Bo (2) = Bo (2)By,' (2) Bey 2) = Baz). 


Conversely, it is easy to check that (11.84) implies (11.83). 


11.3 1. Since X and Y are independent, we have 
0 = Cov(X, Y) = Cov(X, X) = Var(X), 


which shows that X is constant. 
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Table C.9 Number of parameters as a function of m. 


Model m=2 m=3 m=5 m= 10 

Diagonal 9 18 45 165 

Vectorial 21 78 465 6105 

CCC 19 60 265 2055 

BEKK 11 24 96 186 
2. We have 


P(X = x1) P(X = x2) = P(X = x1) P(Y = x2) = P(X = x1, Y = x) 
= P(X =x, X = x2) 
which is nonzero only if x; = x2, thus X and Y take only one value. 
3. Assume that there exist two events A and B such that P(X € A)P(X € B)>0 and 
AM B = Ø. The independence then entails that 
P(X € A)P(X € B)= P(X € A)P(Y EB) = P(X EA,X EB), 
and we obtain a contradiction. 


11.4 For all x e R™("+)/2) there exists a symmetric matrix X such that x = vech(X), and we 
have 
Di Dnx = D} Dmvech(X) = Di vec(X) = vech(X) = x. 


m 


11.5 The matrix A’A being symmetric and real, there exist an orthogonal matrix C (C'C = CC’ = 
I) and a diagonal matrix D such that A'A = CDC’. Thus, denoting by À; the (positive) 
eigenvalues of A'A, we have 


d 
x'A'Ax = x/CDC'x = y'Dy = Vay, 
j=l 


where y = C’x has the same norm as x. Assuming, for instance, that 4; = p(A’A), we 


have 
d d 
2 2 2 
sup ||Ax|7 = sup )\Ajy; <A1 Joo} SA. 
xls Ist Gay “= 
Moreover, this maximum is reached at y = (1, 0,..., 0)’. 


An alternative proof is obtained by noting that || A||? solves the maximization problem 
of the function f(x) = x’A’Ax under the constraint x’x = 1. Introduce the Lagrangian 


L(x, a) =x! A’ Ax — A(x'x — 1). 


The first-order conditions yield the constraint and 


L(x, A) 


ox 


= 2A'Ax — 2Ax = 0. 


This shows that the constrained optimum is located at a normalized eigenvector x; associated 
with an eigenvalue A; of A'A, i =1,...,d. Since f(x;) = x; A' Axi = Ajxjxj; = Aj, we of 
course have ||A||? = max; 


11.6 Since all the eigenvalues of the matrix A'A are real and positive, the largest eigenvalue of 
this matrix is less than the sum of all its eigenvalues, that is, of its trace. Using the second 
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11.7 


11.8 
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equality of (11.49), the first inequality of (11.50) follows. The second inequality follows 
from the same arguments, and noting that there are d2 eigenvalues. The last inequality uses 
the fact that the determinant is the product of the eigenvalues and that each eigenvalue is 
less than || A|J?. 

The first inequality of (11.51) is a simple application of the Cauchy—Schwarz inequality. 
The second inequality of (11.51) is obtained by twice applying the second inequality of 
(11.50). 


For the positivity of H, for all £ >0 it suffices to require Q to be symmetric positive 
definite, and the initial values Ho,..., Hı—p to be symmetric positive semi-definite. Indeed, 
if the H;_; are symmetric and positive semi-definite then H, is symmetric if and only Q is 
symmetric, and we have, for all A € R”, 


q P 
NHA=NOA+ Ya; eas. YO Bid Hy jh > NOK. 
i=l j=l 


We now give a second-order stationarity condition. If H := E(e€/) exists, then this 
matrix is symmetric positive semi-definite and satisfies 


q P 
H=Q+) aH +Y $;H, 
i=l j=l 


that is, 


If Q is positive definite, it is then necessary to have 


q P 
Xat) Bj <1. (C.17) 
i=l j=l 


For the reverse we use Theorem 11.5. Since the matrices C are of the form c;/, with 
ci = a; + Bj, the condition p()-;_, C) < 1 is equivalent to (C.17). This condition is thus 
sufficient to obtain the stationarity, under technical condition (ii) of Theorem 11.5 (which 
can perhaps be relaxed). Let us also mention that, by analogy with the univariate case, it is 
certainly possible to obtain the strict stationarity under a condition weaker than (C.17) but, 
to out knowledge, this remains an open problem. 


For the convergence in L”, it suffices to show that (u,,) is a Cauchy sequence: 


lun —Unllp < Wun unl, H |lun+2 ünyıllp H- lum Um—1|p 


< (m—n)Cl/P "P > 0 


when m >n — oo. To show the almost sure convergence, let us begin by noting that, using 
Hölder’s inequality, 


1 
E [un = Un—1| a {E |Un = ini? } H = Cp 


11.9 


11.10 


11.11 


11.12 
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with C* = C!/? and p* = p!/P, Let vy = u1, Un = Un — Un—) for n > 2, and v = Ù, [vn] 
which is defined in R U {+00}, a priori. Since 


CO 
Ev < Cy eo <o, 


n=1 


it follows that v is almost surely defined in R* and u = )°°-, uv, is almost surely defined 
in R. Since u, = vı +---+ Un, we have u = limy-.o0 Un almost surely. 


It suffices to note that pR + (1 — p)Q is the correlation matrix of a vector of the form 
J/pX + /1— pY, where X and Y are independent vectors of the respective correlation 
matrices R and Q. 


Since the £ j are linearly independent, there exist vectors aw, such that {a ,..., Œm} forms 
a basis of R” and such that “8; = Aaj; for all j =1,...,r andallk=1,..., m? We 
then have 

ASe = Var («'; €; | €u u <t) = a Haj = a, Qa; + À jr, 


and it suffices to take 


Q* = 2— $ (w ee j)B ;8',. 


j=l 
The conditional covariance between the factors we, and Or, for j Æ k, is 
æ’, Hiag = ot! Nak, 


which is a nonzero constant in general. 


As in the proof of Exercise 11.10, define vectors æ; such that 
a A,oj = ot! Qe j + Ajp. 
Denoting by e; the jth vector of the canonical basis of R", we have 
: 
Hy = 2+ YB; {oj + ajelee ej + bj («Hij — a 2a1;)} p, 
j=l 


and we obtain the BEKK representation with K =r, 
QF =Q4 > {cj — bja; Qaj} BiBi, Ar = arbre Be = Vd Byer). 
j=l 


Consider the Lagrange multiplier 4; and the Lagrangian uj Du; — àı (uju — 1). The first- 
order conditions yield 
22u] = 21u41 = 0, 


which shows that u is an eigenvector associated with an eigenvalue 4; of X. Left- 
multiplying the previous equation by u}, we obtain 


1 1 1 
Ay = juju = uy Xu, = VarC , 


3 The B,,..., B, can be extended to obtain a basis of R”. Let B be the m x m matrix of these vectors in 
the canonical basis. This matrix B is necessarily invertible. We can thus take the lines of B7! as vectors a. 
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which shows that 4; must be the largest eigenvalue of £. The vector uw; is unique, up to its 
sign, provided that the largest eigenvalue has multiplicity order 1. 

An alternative way to obtain the result is based on the spectral decomposition of the 
symmetric definite positive matrices 


x= PAP’, PP’ =I, A = diag(A;,...,Am), O< Àm SM. 


Let vı = Puy, that is, u; = P'vı. Maximizing u} Xu, is equivalent to maximizing v Avy. 
The constraint uui = | is equivalent to the constraint vivi = |. Denoting by v;; the com- 


ponents of vı, the function v| Avı = 0", v3 À; is maximized at v; = (£1, 0, .. . , 0) under 


i=l Yi] 
the constraint, which shows that u; is the first column of P, up to the sign. We also see 
that other solutions exist when A; = A>. It is now clear that the vector P’X contains the m 


principal components of the variance matrix A. 


All the elements of the matrices D} Aj, and D,, are positive. Consequently, when Aj, is 
diagonal, using Exercise 11.4, we obtain 


0 < D} (Aik @ Aik)Dm < sup A? Cj, J) Iman+1)/2 


Jef 


element by element. This shows that Dt (Aix ® Ajx)D is diagonal, and the conclusion 
easily follows. 


With the abuse of notation B — Imp = C(Bi,..., Bp), the property yields 
1 B, 
det (B — Alp) = det(—AI,,)det 4 C (By, ..., Bp-1) + z(o (0 Im) 


1 
= det(—À Im )det {c (s: iia , Bp-2, By-1 + 8, ‘ 


The proof is completed by induction on p. 


Let A!/? be the symmetric positive definite matrix defined by A! A! = A. If X is an 
eigenvector associated with the eigenvalue à of the symmetric positive definite matrix 
S= Al/2BA!/? then we have ABA!? X = AA!/2X, which shows that the eigenvalues 
of S and AB are the same. Write the spectral decomposition as § = PAP’ where A 
is diagonal and P’P = Im. We have AB = A'/*SA~'? = A! PA P'AP = QAQ, 
with Q = A!/?P. 


Let c = (c1, c4)’ be a nonzero vector such that 
(A + B)c= (cj Ane + 2c A122 + ch A222) + ci Buci. 


On the right-hand side of the equality, the term in parentheses is nonnegative and the last 
term is positive, unless cı = 0. But in this case c2 Æ 0 and the term in parentheses becomes 
A Anc >0. 


Take the random matrix A = XX’, where X ~ M(0, I p) with p > 1. Obviously A is never 
positive definite because this matrix always possesses the eigenvalue 0 but, for all c 4 0, 
c' Ac = (c'X)* > 0 with probability 1. 
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Chapter 12 


12.1 For œ = 0, we obtain the geometric Brownian motion whose solution, in view of (12.13), 


12.2 


12.3 


is equal to 
x? = xo exp [u — o? /2)t + oW,} : 


By It6’s formula, the SDE satisfied by 1/ xX? is 


1 1 5 
a(z) = xo u +o~)dt —odW;}. 


Using the hint, we then have 


1 1 X 
dY, = X,d (5) + dX, + o° dt 
t X; Xi 
=L 
xo 


It follows that 


The positivity follows. 


It suffices to check the conditions of Theorem 12.1, with the Markov chain (X;,;). 
We have 


-1 í) (t) (t) = = 
T E(X — Xe | Xk-r =y) = p(x), 
—1 (t) (t) (t) Z a 8 
t Var(X — Xue | Xe tyr =x)=o07(x), 


_ 248 2+8 246 
1# E (9 -XQ p| IX = x) =E (Iu + o (x)ea+r] ) 


<09; 


this inequality being uniform on any ball of radius r. The assumptions of Theorem 12.1 are 
thus satisfied. 


One may take, for instance, 
@r =T, Qr =T, Be =1—(1+4+4)t. 
It is then easy to check that the limits in (12.23) and (12.25) are null. The limiting diffusion 
is thus 
dX, = f(o,)dt+o,dw} 
do? = (w—407)dt 
The solution of the o? equation is, using Exercise 12.1 with o = 0, 


o; =g" log — w/6) +o/5, t>0, 


where ae is the initial value. It is assumed that ae >w/6d and ô> 0, in order to guarantee 
the positivity. We have 

lim of = w/ô. 

too 
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In view of (12.34) the put price is P (S, t) = e" T- ET[(K — Sr)* | S+]. We have seen that 
the discounted price is a martingale for the risk-neutral probability. Thus e”? E* [Sr | 
S;] = S+. Moreover, 


(Sp — K)t —(K — Sr)t = Sr — K. 


The result is obtained by multiplying this equality by e~’7—” and taking the expectation 
with respect to the probability z. 


A simple calculation shows that wee D = O(x, + oJ/T) € (0, 1). 
In view of (12.36), It6’s formula applied to C, = C(S, t) yields 
ac, ac, 28°C; aC; 
dC; = + —wuS, + =(oS, dt + —oS,dwW, := uCidt + 0;,C,;dW,, 
t (= is 50 t) aS as, t Mtr Ottot t 
with, in particular, o, = esto S;. In view of Exercise 12.5, we thus have 
Or SiO HoT) _ SO (x, + o/T) 
o C; = gba, toe Ko) 
Given observations $,,..., 5, of model (12.31), and an initial value Sọ, the maximum 


likelihood estimators of m = u — 07/2 and o? are, in view of (12.33), 


x 1 2 0) 1 i A2 
= n 2 108(S:/ Si), 6? = — J log(S;/Si-1) — my. 


i=1 


The maximum likelihood estimator of u is then 


a= ht ô? 
= Ĥ + —. 
äi 2 
Denoting by ¢ the density of the standard normal distribution, we have 
aC(S, t) = Ox, 
— z = Sb + oV7) (= +2) -eT KOK). 


It is easy to verify that Sro (xr + o yT) = e™" Ko (x+). It follows that 


aC(S, t) 


Io = S/T (x; + or/T) > 0. 


The option buyer wishes to be covered against the risk: he thus agrees to pay more if the 
asset is more risky. 


We have S, = S,_1e"~%/2+"? where (n*) ® MO, 1). It follows that 


o? o? ; 
E(S; | 1) = S;-1 exp (« = 5 + T) = e” Sı. 


The property immediately follows. 
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12.10 The volatility is of the form of = w + a(m-1)o7_, with a(x) = œ + {a(x — y)? + B}o2,. 


12.11 


Using the results of Chapter 2, the strict and second-order stationarity conditions are 
Eloga(ņı) <0 and Ea(n,) < 1. 


We are in the framework of model (12.44), with us =r + Ao; — a /2. Thus the risk-neutral 
model is given by (12.47), with nf = à + 7 and 


2 


of =w + lan} à- y)? + pjo. 
The constraints (12.41) can be written as 


e™ = Eexp(a, + bym4i + naa | L), 


1 = Eexp{a; + bimai + enZ + Ze | I 


= Eexp{a, + Mes + iiNet + benega + Cenky | L}. 


It can easily be seen that if U ~ M(0, 1) we have, for a < 1/2 and for all b, 


1 ab? 
E[exp{a(U + b)?}] = ——— 
[expla(U +b? = = exp ( =) 
Writing 
bı 2 b2 
biya + comet =a [n+] = 
we thus obtain 
1 : +r by 1/2 
= —— eX a r ——— }, G< ; 
Jea PF 2(1 — 2c;) t 
and writing 
oni |” ohi 
bina =b t = 
Orme + Orne at + rh, | 4b, 
we have i M : 
FOH 
1 = — exp ( a + ui + STN), 
V1 = 2c; e( EELE TIT Oe) ) 
It follows that : 
Cb toe p u Fg 
2(1 — 2c) = Mt+1 t+1- 
Thus 
2b; + O141 _ Ol A 
21—2c,) 2 ` 


There are an infinite number of possible choices for b, and c;. For instance, if à = 0, one can 
take b; = vor41 and c; = —v with v > —1/2. Then a; follows. The risk-neutral probability 
Tr t41 is obtained by calculating 


3 
ERT hye Bi ger ee Ore ee | 7) 


o? o? 
= exp {u pan gy = z 
2(1 — 2c;) 2(1 — 2c;) 
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12.13 


12.14 


12.15 
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Under the risk-neutral probability, we thus have the model 


3 


log (S;/S;—1) = r= ste + 


iid 
* = Ot * RG 
e; = 740 0) ~ NO, 1). 


Note that the volatilities of the two models (under historical and risk-neutral probability) do 
not coincide unless c, = O for all t. 


(C.18) 


We have VaR; (œ) = — log(2æ)/à. It can be shown that the distribution of L; +2 has the 
density g(x) = 0.25A exp{—A|x|}(1 + A|x|). At horizon 2, the VaR is thus the solution u of 
the equation (2 + Au) exp{—Au} = 4a. For instance, for à = 0.1 we obtain VaR; 2(0.01) = 
51.92, whereas J2VaR;.1 (0.01) = 55.32. The VaR is thus underevaluated when the incorrect 
tule is applied, but for other values of œ the VaR may be overevaluated: VaR; (0.05) = 
32.72, whereas /2VaR,.1 (0.05) = 32.56. 


We have 
AP, — m = AŻ (AP, — m) + Uri + AUi- +++ + AU Ups. 
Thus, introducing the notation A; = (I — A')(I — A)"!, 


h i 
| ne =d > m + AŻ (AP, — m) + pa AIU, j 
i=l j=l 


h h 
= —a'mh — a' A An (AP, — m) — a’ X a A'I | Ups; 
j=l \i=j 
h 
= —a'mh — a AA,(AP; — m) — a' È` An—j+1Ur4j. 
j=1 


The conditional law of L; ++n is thus the M(a’' u,n, a' Epa) distribution, and (12.58) follows. 


We have 


AP = Jo +01 AP? Un = Jo + alw + o APZ)U?, Uro. 


At horizon 2, the conditional distribution of A P,+2 is not Gaussian if a; > 0, because its 
kurtosis coefficient is equal to 


ZAR 3 (1 =- e ) 3, A (o +a AP?) 
= > 3, = qı(w +Q : 
(E, AP? 4)? (w+ 6, -— ce 


There is no explicit formula for Fp when h > 1. 
It suffices to note that, conditionally on the available information /,, we have 


h— —VaR,(h, 
a= p (2H Pt 2 aR; (h, œ) 11) 
Pt Pt 


VaR, (h, œ) 
= Pirate tran < log es eae PAS 
t 
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12.16 For simplicity, in this proof we will omit the indices. Since L; +4, has the same distribution 
as F iu ), where U denotes a variable uniformly distributed on [0, 1], we have 


El Lz tth LL, rr > VaR] = ELF (U) lp- F-a] 
= E[F7'(U) ly > 1-a] 


1 a 
=f rodu = f F-'(—u)du 
1 0 


= / VaR; (u)du. 
0 


Using (12.63), the desired equality follows. 


12.17 The monotonicity, homogeneity and invariance properties follow from (12.62) and from the 
VaR properties. For L3 = Lı + Lz we have 


a {ES (æ) + ES2 (œ) — ES3(a)} 
= E[Li (Uz, >var, (a) — Un3>var3(@))] + ElL (lL >vaR; (a) — 113>var3(a))]- 


Note that 
(Lı — VaRj(@)) (1z >vary (a) — Un3>Vvar3(a)) = 0 


because the two bracketed terms have the same sign. It follows that 
a{ES| (œ) + ES2(a) — ES3(a)} > VaRı (@)) EU, >var,(@) — I13>Vvar3(«)] 
+ VaR2(a)) E[l, >VaR (œ) — 113>Var3(a)] 
= 0. 
The property is thus shown. 
12.18 The volatility equation is 


Dy 
= 


a(m—1)07 1, where a(x) = à + (1 — Ax?. 


It follows that 
o? =a(m-1)...a(no)oG- 


We have E loga(n:) < log Ea(n;) = 0, the inequality being strict because the distribution 
of a(n) is nondegenerate. In view of Theorem 2.1, this implies that a(j;-1)...a(no) > 0 
a.s., and thus that g — 0 a.s., when ¢ tends to infinity. 


12.19 Given, for instance, o;4; = 1, we have r;+1 = n:+1 ~ M(0, 1) and the distribution of 


r2 =y ~à +(-— AN M42 


is not normal. Indeed, Er;42 = 0 and Var(r;+2) = 1, but Eff) =3{1 +20 —A)*} 43. 


Similarly, the variable 
Tii + r2 = Me HACA WN M42 


is centered with variance 2, but is not normally distributed because 


Efi +rig2)* = 6{1 +2- A) 12. 


Note that the distribution is much more leptokurtic when à is close to 0. 


Appendix D 


Problems 


Problem 1 


The exercises are independent. Let (7) be a sequence of iid random variables satisfying E (n) = 0 
and Var(n;) = 1. 


Exercise 1: Consider, for all t € Z, the model 


Et = Ont 
Oo; = ot Xi aj |€;—i| + SA Bjor-j, 


where the constants satisfy w > 0, a; > 0, i = 1,...,q and Bj > 0, j =1,..., p. We also assume 
that 7; is independent of the past values of €,. Let u = E|n:|. 


1. Give a necessary condition for the existence of E|e,|, and give the value of m = E|é;|. 


2. In this question, assume that p = q = 1. 


(a) Establish a sufficient condition for strict stationarity using the representation 
Oo, = W + a(n-1)%-1, 


and give a strictly stationary solution of the model. It will be assumed that this condition 
is also necessary. 


(b) Establish a necessary and sufficient condition for the existence of a second-order stationary 
solution. Compute the variance of this solution. 


3. Give a representation of the model which allows the coefficients to be estimated by least squares. 
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Exercise 2: The following parametric specifications have been introduced in the ARCH literature. 


(i) Quadratic GARCH(p, q): 


q 2 
2 X ' X ' 2 
€& =M, Of =|o+t Qi Eti + Bjo j 
i=1 


(ii) Qualitative ARCH of order 1: 


I 
Et = Ott, t = J Qj qa, (€1-1), 
i=] 


where the œ; are real coefficients, (A;, i = 1,..., I) is a partition of R and 11,4, (x) is equal 
to 1 if x € A;, and to O otherwise. 


(iii) Autoregressive stochastic volatility: 
E€ = 0t, Of = w + Poi +Our, 


where the u; are iid variables with mean O and variance 1, and are independent of 
(m, t € Z). 
Briefly discuss the dynamics of the solutions (in terms of trajectories and asymmetries), 
the constraints on the coefficients and why each of these models is of practical interest. 
In the case of model (ii), determine maximum likelihood estimators of the coefficients a;, 
assuming that the A; are known and that 7; is normally distributed. 


Exercise 3: Consider the model 


| E€& = ON + O2rN2 
2 _ ae Pop? _ 
og = Ot Via CHG + jn Bij, 1 = 1,2, 


where (nır, n2:) is an iid sequence with values in R? such that E (n1) = E(x) = 0, Var(ni;) = 
Var(2;) = 1 and Cov(71;, n2t) = 0; cir and ox belong to the o-field generated by the past of €+; 
and œ; > 0, aj; > 0 (i = 1,...,¢), and Bj > O(j =1,..., p). 


1. We assume in this question that there exists a second-order stationary solution (€,). Show that 
Ee, = 0. Under a condition to be specified, compute the variance of €z. 


2. Compute E(e7|é:-1, €;-2,...) and E(e4\e-1, €;-2,..-.). Do GARCH-type solutions of the form 
€; = Oth exist, where o, belongs to the past of €, and n; is independent of that past? 


3. In the case where 6; ; = 62; for j = 1,..., p, show that (€;) is a weak GARCH process. 


PROBLEMS 
Solution 


Exercise 1: 
1. We have 


q P 
Eļel = El@)ln| =u {o+ ) aE lel + D> jE) 
i=1 j=l 
and, putting r = max(p, q), m satisfies 
m 
m i = Yenta] = po. 
i=1 


The condition is thus )7)_, (a; + Bi) < 1. 
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2. (a) We obtain a(n:-1) = a1|nr-1| + 61. Arguing as in the GARCH(1, 1) case, it can be shown 


that a sufficient strict stationarity condition is E{loga(n;)} < 0. 


(b) First assume that there exists a second-order stationary solution (€;). We have E (e?) = 


E(o7). Thus 
2 2 žigi 2 Bi 
E(e;) =° + (aj + By + 201 Bim) El) + 2@ (01 + P Ele;—-1| 
and, since E(€?) = Ee), 
(1 — af — bi — 201812) Ele?) = w + 20 (o + £.) Eleril: 


Thus we necessarily have 
a? + B? +2æißıu < 1. 


One can then compute 


1 2 
uw (2) = (+aiu+ Bi)o 


Ele;| = ————— n 
EETTETI (1 — au — bı) (1 — a? — B? — 2a By) 


Conversely, if this condition holds true, that is, if E {a?(n)} < 1, by Jensen’s inequality 


we have 


1 5 1 
E{loga(n,)} = 5E{loga’(m)} < 5 log E{a?(n)} < 0. 


Thus there exists a strictly stationary solution of the form 


[e.e] 
& = qo X a(m-1)...a(M-K), 


k=0 
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with the convention that a(n,;-1)...a(m;-~) = 1 if k = 0. Using Minkowski’s inequality, it 
follows that 


27 1/2 


Ee? = Laem, 1) -a (Nek) 


[oe] 
ye 
<w iz {a OAH Z 00: 
k=0 


3. Assume that the condition of part 1 is satisfied. We have 
E (\e| | €u, U < t) = uor. 


Let 
i = lel =E (lel | Eu, U < t) 


be the innovation of |e;|. We know that (u+) is a noise (if E e < oo). Multiplying the equation 
of o, by u and replacing uoj by |é;—;| — ur-j, we obtain 


r P 
lel- $ (œin + Bidler—il = ou +u — Y pju. 


i=l j=l 


This is an ARMA(r, p) equation, which can be used to estimate the coefficients by least squares. 


Exercise 2: 


Model (i) displays some asymmetry: positive and negative past values of the noise do not have 
the same impact on the volatility. Its other properties are close to the standard GARCH model: 
in particular, it should allow for volatility clustering. Positivity constraints are not necessary on 
w and the a; but are required for the 6;. The volatility is no longer a linear function of the past 
squared values, which makes its study more delicate. 

Model (ii) has a constant volatility on intervals. The impact of the past values depends only 
on the interval of the partition to which they belong. If the œ; are well chosen, the largest values 
will have a stronger impact than the smallest values, and the impact could be asymmetric. If J is 
large, then the model is more flexible, but numerous coefficients have to be estimated (and thus it 
is necessary to have enough observations within each interval). 

Model (iii) is a stochastic volatility model. The process o, cannot be interpreted as a volatility 
because it is not positive in general. Moreover, |o;| is not exactly the volatility of €; in the sense 
that oa? is not the conditional variance of ¢«;. At least when the distributions of the two noises 
are symmetric, the model should be symmetric and the trajectories should resemble those of a 
GARCH. 

For model (ii), neglecting the initial value, up to a constant the log-likelihood has the form of 
that of the standard GARCH models: 


ene e? 
L, (0) = —= logo? (6 lak 
o z È flee a] 


Thus the first-order conditions are 


PAD 2 2S 1 7 € 
7 a me Hon 


o7 ( 
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We have o7(0) = a? if €,1 € Aj. Let T; = {t; &-1 € Aj} and let |T;| be the cardinality of this 
set. Thus 


and finally, the maximum likelihood estimator of œ; is 


1 
a; = — e, 
IT: 


teT; 


Exercise 3: 


1. We immediately have E (e€:) = 0 by the independence between the n;; and the past. Moreover, 


4 P 
E(€?) = Elof +03) = 01 +02 +) (aij + oi) El a) +Y Bij Elf) + Boj EO, ;) 
i=l j=1 


Thus if 61; = fo; := pj for all j, we obtain 


@ + w2 


rd es 
é 1- Dia (a1; + 021) — 4, B; 


2. Let ui = E (n$). We have 
Ee? |e, u<t)=o02+02, 
4 _ 4 4 
Ele; | €u, U < t) = M101, + 1204). 
In general, there exists no constant k such that 
2 
[Ee | €u, U < t)} =kE(ef | €u, U <t), 


which shows that (€;) is not a strong GARCH process. 


EER EEE E : : 2 : 2 2 Foo , 
3. Let u; = €f — oj, — 05, be the innovation of e7. Replacing OF yj + OF yj by E_j T Ur- j, We 


have 
q P 


e =o, Ha X i + oj )e7p_y + u + X (ej — t-j) , 
i=l j=l 


which shows that (e?) satisfies an ARMA{max(p, q), p} equation. 
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Problem 2 


The three parts of this problem are independent (except part III.3). 
Consider the model 
Et = QtEt-1 + Or, 


where (œ+) and (œr) are two sequences of iid random variables with symmetric distributions such 
that E (œ) = E(@;) = 0 and 


Var(a;) =œ >0, Var(@;) =w>0, 


and admitting moments of order 4 at least. Assume in addition that œ; and œ; are independent of 
the past of €, (that is, independent of {€,_,, s > 0}). 


Part I: Assume that the sequences (œ+) and (œw) are independent. 


I.1. Verify that, in the sense of Definition 2.1, there exists an ARCH(1) solution (€,), with E (e?) < 
oo. Show that, in general, (€,) is not an ARCH process in the strong sense (Definition 2.2). 


1.2. For k >Q, write €, as a function of €,_, and of variables from the sequences (œ+) and (œr). 
Show that 
E log |a;| < 0 


is a sufficient condition for the existence of a strictly stationary solution. Express this condition 
as a function of œ in the case where a; is normally distributed. (Note that if U ~ N(O, 1), 
then E log |U| = —0.63.) 


1.3. If there exists a second-order stationary solution, determine its mean, E (e+), and its autoco- 
variance function Cov(eé,, €-an), Yh > 0. 


1.4. Establish a necessary and sufficient condition for the existence of a second-order stationarity 
solution. 


1.5. Compute the conditional kurtosis of €,. Is it different from that of a standard strong ARCH? 
1.6. Is the model stable by time aggregation? 

Part II: Now assume that a, = A@;, where A is a constant. 

I.1. Do ARCH solutions exist? 


II.2. What is the standard property of the financial series for which this model seems more appro- 
priate than that of part I? For that property, what should be the sign of à? 


I.3. Assume there exists a strictly stationary solution (€+) such that Æ (4) <00. 
(a) Justify the equality Cov(e,, é? ,) = 0, for all h > 0. 
(b) Compute the autocovariance function of (€). 


(c) Prove that (€,) admits a weak GARCH representation. 


Part III: Assume that the variables œ; and @; are normally distributed and that they are correlated, 
with Corr(a@;, œ) = p € [—1, 1]. Denote the observations by (€1,..., €) and let the parameter 
0 = (a, @, p). 
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Ill.1. Write the log-likelihood L,,(@) conditional on €9. 


II.2. Solve the normal equations 
Ly 6 0. 
30 (6) 


How can these equations be interpreted? 


IlI.3. The model is first estimated under the constraint p = 0, and then without constraint, on a 
series (r+) of log-returns (7; = log(p;/p;—1)). The series contains 2000 daily observations. 
The estimators are assumed to be asymptotically Gaussian. The following results are obtained 
(the estimated standard deviations are given in parentheses): 


â ô p Ln (ô) 
Constrained model 0.54 0.001 0 —1275.2 
(0.02) (0.0002) 
Unconstrained model 0.48 0.001 —0.95 — 1268.2 
(0.05) (0.0003) (0.04) 


Comment on these results. Can we accept the model of part I? Today, the price falls by 
1% compared to yesterday. What are the predicted values for the returns of the next two 
days? How can we obtain prediction intervals? 
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Solution 

Part I: 

I.1. The existence of E (e?) allows us to compute the first two conditional moments of €;: 

E(éléu, u < t) = E (a )€-1 + E(a;) = 0, 
2 _ 2 2) 20 De a 389 
E(éjléu, u<t)= E(@;)+ Ela e = 0 +e = a7, 
in view of the independence between œ; and œw on one hand, and between these two variables 
and ¢€,,u < t, on the other hand. These conditional expectations characterize an ARCH(1) 
process. This process is not an ARCH in the strong sense because the variables e? CA = 
e? / Œ + ae?) are not independent (see I.5). 
1.2. We have, for k >Q, 
k-1 
Et = Op... Xt—k+1Et—k F Oe + Xo +++ Qt—n+1@r—n> 
n=1 
where the sum is equal to 0 if k = 1. Let us show that the series 
CO 
Zt = wr + Xo +++ Æt—n+10t—n 
n=1 
is almost surely well defined. By the Cauchy root test, the series is almost surely absolutely 
convergent because 
1 n—1 
exp l: = log |a;—e| + log [@r-n] ; 
e=0 
converges a.s. to exp (E log |a@;|) < 1, by the strong law of large numbers. Moreover, we have 
Zt = M2;-1 + @. Thus (z+) is a strictly stationary solution of the model. If a, = ./aU,;, where 
U, ~ N(O, 1), the condition is given by a < exp(1.26) = 3.528. 

1.3. We have E(e,) = 0, Cov (e+, €-an) = 0, for all h > 0, and Var(e,) = w/(1 — æ). 

1.4. In view of the previous question, a necessary condition is œ < 1. To show that this condition 
is sufficient, note that, by Jensen’s inequality, it implies the strict stationarity of the solution 
(z;). Moreover, 

[oe] [oe] 
E?) = E(w?) + > Ea)... Ela? DEO n) =o+ yore < o. 
n=1 n=1 
L5. Assuming E(e}) < œ and using the symmetry of the distributions of a, and w;, we have 


E(efleu, u < t) = E(wp) + 6wae?_, + E(aye? ,. 


The conditional kurtosis of €; is equal to the ratio E (ef l€u, U <t) /o7. If this coefficient were 
constant we would have, for a constant K and for all ft, 


Elw) + 6wae?_, + Ela et] = Klo tae). 


It is easy to see that this equation has no solution. 
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1.6. The model satisfied by the process (ež) := (€2;) satisfies 


* k kt * 
€, = 2724-1 €2(¢-1) + V2 + O2;W2¢-1 = A, E1 +O, - 


The independence assumptions of the initial model are not all satisfied because a* and w; are 
not independent. The time aggregation thus holds only if a dependence between the variables 
œw, and a; is allowed in the model. 


Part II: 
IL.1. We have 


E(ele,, u < t) = olei + 1)’, 


which is incompatible with the conditional variance of an ARCH process. 


I.2. The model is asymmetric because the volatility depends on the sign of €;_;. We should have 
à < 0, so that the volatility increases more when the stock price falls than when it rises by 
the same magnitude. 


IL.3. (a) 
(b) 


(c) 


The equality follows from the independence between œw; and all the past variables. 


For h> 1, 


Cov (e, 5) = Cov {a7 (Ae + i ; e 4) 
= wCov (Are 4 + 2r6,_1 + 1, e?) 


= w) Cov (e? 4, €p) 
using (a) (Cov (e1, e?) = 0 for h > 1). For h = 1, we obtain 


Cov (ef, €7_1) = wCov (A7€7_; +2Aer1 +1, €21) 
= wA Var (e71) + 2@AE (e2). 


We have E (ef _,) = E (w_,) E Aez + 1) = 0, because the distribution of œ;—1 is 
symmetric. Finally, the relation 


Cov (€?, €7_,) = @A?Cov (e? 1, €p) 


is true for h > 0. For h = 0 we have 


_ E@})(67E(€?) + 1) 
E 1 — AtE (%4) 


E= a Ee 


which allows us to obtain Var(e?) and finally the whole autocovariance function of (€). 


The recursive relation between the autocovariances of (€7) implies that the process is an 
AR(1) of the form 
e? =w + wA e] Fur 


where (u+) is a noise. The process (€+) is thus an ARCH(1) in the weak sense. 
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Part III: 
HI.1. We have 


=f n n e2 
Ln (0) = = (> log 2a + X logo? + > 2) , 
t=1 t=1 t 


where 


o? =w + ae? | +2pJawe-1. 


HI.2. The normal equations are 


with 
3o? 


00 


d 
= (a + pé&ıy w/a, 1+ peiya/o, 2Jawe,-1) : 


These equations can be interpreted, for n large, as orthogonality relations. 


HI.3. Note that the estimated coefficients are significant at the 5% level. The constrained model 
(and thus that of part I) is rejected by the likelihood ratio test. The sign of 6 is what 
was expected. The optimal prediction is 0, regardless of the horizon. We have E (€7 ley, u < 
t) = o? and E (eža ly, u < t) = aa? +. The estimated volatility for the next day is 6? = 
0.48 (log 0.99)? + 0.001 — 2 x 0.95 x /0.48 x 0.001 x log 0.99 = 0.0015. The 95% confi- 
dence intervals are thus: 

at horizon 1, [—1.966;; 1.966;] = [—0.075; 0.075]; 
at horizon 2, [—1.96,/@67 + ô; 1.96,/a67 + ô] = [—0.081; 0.081]. 
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Problem 3 


Let (n+) be a sequence of iid random variables such that E(7;) = 0 and Var(7;) = 1. Consider, for 
all t € Z, the model 


Et = Ort 
: : D.1 
| of =O+ Pi oiler” + 4) bjo j (D.1) 
where r is a positive real. The other constants satisfy: 
w>0, of >= 0, TS ley pb; =0, j=l,...,p. (D.2) 


1. Assume in this question that p = q = 1. 


(a) Establish a sufficient condition for strict stationarity and give a strictly stationary solution 
of the model. 


(b) Give this condition in the case where 6; = 0 and compare the conditions corresponding to 
the different values of r. 


(c) Establish a necessary and sufficient condition for the existence of a nonanticipative strictly 
stationary solution such that Ele;|?” < oo. Compute this expectation. 


(d) Assume in this question that r < 0. What might be the problem with that specification? 


2. Propose an extension of the model that could take into account the typical asymmetric property 
of the financial series. 


3. Show that from a solution (€;) of (D.1), one can define a solution (ež) of a standard GARCH 
model (with r = 2). 


4. Show that if Ele,|?” < oo, then (le,|") is an ARMA process whose orders will be given. 


5. Assume that we observe a series (€;) of log-returns (e€; = log(p;/p;—1)). For different powers 
of |e,|, ARMA models are estimated by least squares. The orders of these ARMA models are 
identified by information criteria. We obtain the following results, v; denoting a noise. 


Estimated model 


What values of r are compatible with model (D.1)—(D.2)? Assuming that the distribution 
of 7; is known, what are the corresponding parameters œ; and 8j? 


6. In view of the previous questions discuss the interest of the models defined by (D.1). 
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Solution 


1. (a) 


(b) 


(c) 


We have 
o; = 0 +0ile-il + pici = @+ lnl + BDT = @ + a(m-1)0;_1. 


Thus, for N > 0, 


N 
Oo; = 0 i Te X am) oa ato) + a(n-1) - .- (Mr—-N-1)0;_Ny_1- 


n=1 


Proceeding as for the standard GARCH(1, 1) model, using the Cauchy root test, it is shown 
that if 
E log{a(n:)} < 0 


the process (h+), defined by 


N 
h= m. a.s. i + Xam) sa ato] w, 


n=1 


exists, takes real positive values, is strictly stationary and satisfies h; = w+ a(m-—1)hr. 
A strictly stationary solution of the model is obtained by é; = hy! "nt. This solution is 
nonanticipative, because €; is function of 7, and of its past. 
If 6; = 0, the condition is given by 

a < eE log in| 


Using the Jensen inequality, it can be seen that 
1 2 1 2 
E log |n| = E į 5 login" f} < 5 log Eiln} = 0. 


The conditions are thus less restrictive when r is larger. 
If (€+) is nonanticipative and stationary and admits a moment of order 2r, we have E (e?) = 
E (o7 )E ( na) and, since ņ;—ı and o;_; are independent, 
EOF) = E [o +a- Dol} 
=—w+2wE {a(n;—-1)} E (o) +E fanı} E (o2) : 
Thus, by stationarity, 
[1 — E fand }] E (oP) =% + 20E {a(n,)} E (or) >0. 


A necessary condition is thus E {a(n)?} < 1. Conversely, if this condition is satisfied we 
have 


E(h;) < 20° X [E fan? Y" < 00. 
n=1 


The stationary solution that was previously given thus admits a moment of order 2r. Using 
the previous calculation, we obtain 


2 2 
2r\ __ 2r w 2% E {a(n)} 
La (; — E {a(n)*} - [1 — E{a(n)?}] — E <u) 
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(d) If r < 0, a(y;) is not defined when n; = 0. This is the same for the strictly stationary 
solution, when one of the variables 7,;—; is null. However, the probability of such an event 
is null if P(n, = 0) = 0. 


. The specification 


q P 
o; =w + = (æi le; >0 +Qi, — i <0) leil” + 5 Bjo; j> 
i=l j=l 


with œ; >0 and a;,_ > 0, induces a different impact on the volatility at time ¢ for the positive 
and negative past values of €,. For œ; = aj, we retrieve model (1). 


. Let ef = je,|"/2v,, where (v,) is an iid process, independent of (€,) and taking the values —1 
and | with probability 1/2. We have € = o,'n* with 


q P 
+2 r X i *2 X ; *2 
Oo, = 0; =w+ aj lei | SI Bjo j 
i=l j=l 


and (77) = (\7;|’/2v,) is an iid process with mean 0 and variance 1. The process (e¥) is thus a 
standard GARCH. 


. The innovation of |e;|" is defined by 
uy = |e!” — E (lel | 1, G25 ---) = lel” — of ur 


with E|n,|" = ur. It follows that 


max(p,q) P 


lel” =ou + JO ibe + Bidler—il” +e — Y Bjuj 


i=l j=1 
and (|€e;|") is an ARMA{max(p, q4), p} process. 


. Noting that the order and coefficients of the AR part in the ARMA representation are greater (in 
absolute value) than those of the MA part, we see that only the model for r = 1 is compatible 
with the class of model that is considered here. We have r = p = 1 and q = 2, and the estimated 


coefficients are P $ jö 
A =o 47S a. aoe 
Hı Hı 


. The proposed class can take into account the same empirical properties as the standard GARCH 
models, but is more flexible because of the extra parameter r. 
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Problem 4 


Let (n) be a sequence of iid random variables such that E (n) = 0, Var(n;) = 1, EnA = p4. Let 
(a) be another sequence of iid random variables, independent of the sequence (7;,), taking the 
values 0 and 1 and such that 


Pia, =l)=p, P(a, =0)=1-p, O<p<l. 


Consider, for all tf € Z, the model 


Er = {o1+4; + ox (1 — a;)}mr, (D.3) 
of, =o +a oz = 2 + a€; 1, (D.4) 
where 
wi >Q, a; > 0, b= 1,2, 


A solution such that e; is independent of the future variables m4, and a:4n, h >Q, is called 
nonanticipative. 


1. What are the values of the parameters corresponding to the standard ARCH(1)? What kind of 
trajectories can be obtained with that specification? 


2. In order to obtain a strict stationarity condition, write (D.4) in the form 


2 2 
— | u )_f @! OT p41 
= =( oz; )=( on )+4ra( OF 7-1 ) a 


where A;_; is a matrix depending on N:—1, d;-1, @1, @2. 


3. Deduce a strict stationarity condition for the process (Z;), and then for the process (e+), as 
function of the Lyapunov coefficient 


1 
y = lim as. — log ||A;A;-1 ... Aj || 
too t 


Gustify the existence of y and briefly outline the steps of the proof). Note that A; is the product 
of a column vector by a row vector. Deduce the following simple expression for the strict 
stationarity condition: 


j= 
a? a P <c, 


for a constant c that will be specified. How can the condition be interpreted? 


4. Give a necessary condition for the existence of a second-order and nonanticipative station- 
ary solution. It can be assumed that this condition implies strict stationarity. Deduce that the 
necessary second-order stationarity condition is also sufficient. Compute the variance of €;. 


5. We now consider predictions of future values of €; and of its square. Give an expression for 
2 š 
E (€r4nl€t—1, €-2,--.) and E(e;, ,|€-1, €-2,--.), for h > 0, as a function of €;—1. 


6. What is the conditional kurtosis of €;? Is there a standard ARCH solution to the model? 


7. Assuming that the distribution of n; is standard normal, write down the likelihood of the model. 
For a given series of 2000 observations, a standard ARCH(1) model is estimated, and then 
model (D.3)—(D.4) is estimated. The estimators are assumed to be asymptotically Gaussian. 
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The results are presented in the following table (the estimated standard deviations are given in 
parentheses, L,,(-) denotes the likelihood): 


Ô âi â a2 Ê log Ln (ô) 
ARCH(1) 0.002 0.6 — — — —1275.2 


(0.001) (0.2) 


Model (D.3)-(D.4) 0.001 0.10 0.005 1.02 0.72 — 1268.2 
(0.001) (0.03) (0.000) (0.23) (0.12) 


Comment on these results. Can we accept the general model? (we have P { x73) > 7.81} = 
0.05). 


8. Discuss the estimation of the model by OLS. 
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Solution 


1. The standard ARCH(1) is obtained for a; = a2, Vp. It is also obtained for p = 0, Vai, a2 and 
for p = 1, Va,, a2. The trajectories may display abrupt changes of volatility (for instance, if w1 
and œz are very different). 


2. We obtain equation (D.5) with 


a (Paa aun 022) 
‘oe a 2 1— 2 . 
AN; 141 2N; ( a1) 


3. The existence of y requires E log? ||A;|| < oo. This condition is satisfied because E || A; || < 00, 
for example with the norm defined by ||A|] = J- Ja;j|. The strict stationarity condition y < 0 
is shown as for the standard GARCH. Under this condition the strictly stationary solution of 


(D.5) is given by 
oo 
= 1 
Zi = (: + 2 ArAr—ı sára) ( o | 


Note that 5 
17) 2 2 
a= ae ) l—a7). 

Thus 5 

a 
AiAi = nka faa tant =a} ( A \ (hy 1-42), 
2; 
= 2 2 2 ain? 2 2 
ArAy-1-.- Al = ni Ne laia; +a(l — az )} ( une ) (ag 1— ag) g 
i=0 


> 


t—2 2 
a 
WArAra.. Aill = nt] [nz {ora?_; + o2(1 —a?_;)} I( one ) e 1 — ag) 
i=0 


because a, and q@ are positive. It follows that, by the strong law of large numbers, 


t-1 t—2 


1 1 1 
> log Ar Ari---Aill = 7 Wena + 7 he laa +an(1—a?_;)} 


FA 
On, 2 =g 
( aon? ) a) 


t 


+ . l 

-lo 

; 8 
> Elogn? + E log foia; + a2(1 — a;)} 

almost surely as t — oo. The second expectation is equal to 

leap, 


ploga; + (1 — p)logaz = loga’ a, 


It follows that 


y<0 = aPa, ’ <exp{—E(logn?)}. 


We note that the condition is satisfied even if one of the coefficients, for instance a, is large, 
provided that the corresponding probability, p, is not too large. 
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4. If (€;) is second-order stationary, we have 
Ee = p {oi +a El} + = p) {or +arE()} . 
The necessary condition is thus 
@:= pa; + (1 — p)a2 < 1, 


and we have 


Viaje ee 
T= pa — (l= pjo 


Conversely, suppose that this condition is satisfied and that it implies that y < 0. We have 


_[/œ%ap ad—p) 
BA) = ( ap a2(1— p) ). 


This matrix has rank 1, and thus admits a zero eigenvalue and a nonzero eigenvalue that is 
equal to its trace, w. This coefficient, less than 1 by assumption, is also the spectral radius 
of E(A,). It follows that the expectation of the stationary solution Z, defined above is finite 
because EA;A;_1...A;—; = {E (AÐ). 


5. We clearly have E(€;+n | €:-1, €r-2,..-) = 0. Since a;(1 — ar) = 0 we have 


2 


2 2 2 2 2 2 
Een = Horan +020 — an) } + forge, +21 arh) } Ereni] tien 


3 2 
= Orth T At+h€p4p—1 


~ P ow z rey) 
= Orth + Qt4hOryh-1 Heo + ren -o o Air F Opty... pep]. 


Letting © = E&;, and since ¥ = E&,, it follows that 


E (È ple) =O(1+a+--- +a") Hae 
for h >0. 
6. The conditional kurtosis is equal to 
Elet lei) Emp poi + — p)oy, 
2 2 2" 
{E(e7 | e1.) [EQF {po} + (0 — p)oż} 


This coefficient depends on f in general, which shows that there is no standard GARCH solution 
to this model (except in the cases mentioned in part 1). 


7. The conditional density of e; is written as 


1 2 1 z 
l; = p——— exp Zs + (1 — p)—==— exp į - ` 
27014 207. J 21024 202, 


and the log-likelihood of the sample is the product of the /, for t = 1,...,n. The estimation 
results display very different estimated coefficients œ; and a2. Moreover, the likelihood ratio test 
is equivalent to comparing the difference of the log-likelihoods and the quantile of order 1 — a 
of a x° (3). Since we have 2 x (1275.2 — 1268.2) > 7.81, the standard ARCH(1) is rejected in 
favor of the general model at the 5% level. 
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Problem 5 


i 


Let (7) be a sequence of iid random variables, such that E(n;) = 0. When E|n; 
Um = Eni". Consider the model 


< oo, denote 


Et = M + bner, t€Z. (D.6) 


1. Strict stationarity 
(a) Let 


n 
Zin =h + > bi NtNt—1 °° Nti- 


i=1 


Show that if E log |by;| < O then the sequence (|Z;n|)n>1 converges almost surely. Under 
this condition, let 


CO 
Zi = + Yo bnm- Hee Mpa. 
i=l 


(b) Show that if E log |bn,| < 0 then equation (D.6) admits a nonanticipative and ergodic strictly 
stationary solution. 


(c) We have 


o0 1 x2 
log |x exp į —— ł dx = —0.635181. 
J. g |x| r7 P| -| 


Give the strict stationarity condition when n, ~ M(0, m2). 
2. Second-order stationarity 
(a) Under what condition is (Z; n)n a Cauchy sequence in L?? 
(b) Show that b?uz < 1 entails E log |bn;| < 0. 
(c) Show that if b?4. < 1 then (e,) = (Z,) is the second-order stationary solution of (D.6). 


(d) Assume that u2 Æ 0. Show that the condition b? 12 < 1 is also necessary for the existence 
of a nonanticipative second-order stationary solution. 


3. Properties of the marginal distribution and conditional moments Assume that b?uz < 1 
and that (€,) is the second-order stationary solution of (D.6). 


(a) Show that (€;) is a weak GARCH process whose orders will be specified. 


(b) Compare the conditional variance of (€,) with that of a strong ARCH(1). Does the sign 
of €,-; have an impact on the volatility at time t? Is this property of interest for financial 
series? 


4. Estimation Denote by bo and uoz the true value of the parameters b and u2. Assume that 
bobo < | and that €;,..., €, is a second-order stationary realization of model (D.6). Let 


h; = hy (b, u2) = p2(1 + berı) and ho = Ar (bo, 102). 


(a) What is the interpretation of v; = e ho = (n? 02) (1 + boe)? 
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(b) Assume that E ef < o. Show that, almost surely, 


2 n 
Jim = 2 ve (hor — hi) = 0. 


(c) Assume that the distribution of €; is not concentrated on one or two points (in particular, 
u2 Æ 0). Show that 


E(ho, —h;)* =0 if and only if b = bo and u2 = uoz. 


(d) Under the previous assumptions, consider the criterion 


n 


1 
On(b, ud = = Yo fe? — hn} 


t=2 
Show that, almost surely, 
lim Q,(b, u2) > lim Qn(bo, o2) 
n= n—- oo 
with equality if and only if b = bọ and u2 = moz. Give an estimation method for the 


parameters. 


(e) Describe the quasi-maximum likelihood method. 


. Extension Without giving detailed proofs, extend the stationarity and estimation results to the 
model 


E€ =m +bineg1+-:-+bgmérq, t € Z. 
. Further extensions Assume that u2 4 0, u3 = 0 and ubt <1. 


(a) Compute the autocovariance function of é?. Prove that e? follows a weak ARMA model 
whose orders will be given. 


(b) Propose a moment estimator for u2 and b?. Show that this estimator is consistent. 
(c) When they exist, compute the matrices 


9 Qn (bo. 02) 
; ðb 
I = lim Var į vn 8 On (9.1409) 


n—> oo 
ðu 


and 
3? On (bouo) 3? Qn (bo. u02) 
P ab ðbzðu2 
J= lim | 3,92 3 
n= 3“ Qn (bo. u02) 3- Qn (bo. 402) 
ðu2ðb2 ams 


What moment condition is necessary for the existence of J and J? 


(d) Give the scheme of proof which would establish that, under some assumptions, the least- 
squares estimator 6 of the parameter 6) = (bo, [402)' satisfies 


Vn — 0) É NO, E), 


where E (6o) = JTH J !. 
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(e) Let ; 


€ 
GO) =logh +=, m =m + be;-1)’. 
t 


When it exists, compute 


JOML =E 3? L, (60) 
0000! 


and show that 


1 dh; (00) Oh; (60) dl; (60) 
JOML _ FE : = (k,=1)V 
Geax að ə@ (aeea 30 


> 


where 
_ Mo 


=. 
Hoz 


Ky 


What moment condition is necessary for the existence of J2”"? 
(f) Give the scheme of proof which would establish that, under some assumptions, the quasi- 
maximum likelihood estimator 62" satisfies 
TOME — 6) É N(0, E (0o)2™}). 
(g) What is the particular form of LE (Gq) at Oy = (0, uo2)'? What are the consequences on 
the asymptotic properties of the estimator? 
(h) Compare E (8o) and ZEME (Gy) at O = (0, 02)’. 


(i) Without giving detailed proofs, extend the stationarity and QML estimation results to the 
model 
Er = Me + bhime-1t+---+bgme-q, t €Z. 
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Solution 
1. (a) In view of the Cauchy root test, it suffices to show that almost surely 
lim [biq eqil < 1. 
I1—> 00 
By the law of large numbers, this limit is equal to 


1 
lim oo |! D log |bn:—| + ein} = exp{E log |bni|}, 
I-00 


k=l 
which shows the result. 
(b) For all n, we have 


Zin =h + bni Zt—1,n-1- 


Taking the limit, Z; = ņ: + bn;Z;—-1, which shows that (€;) = (Z+) is a nonanticipative 
solution of (D.6). Since Z; = f (nt, ne-1, ...) (where f : R® — R is measurable) and (n+) 
is ergodic and stationary, (Z+) is also stationary and ergodic. 


(c) We have E log [n J /mz] = —0.635181. The stationarity condition is thus written as 


E log 


< 0, 


b/m |= log |b m| + E log 


J= 


or equivalently 
|b| u2 < exp {0.635181} = 1.88736. 


2. (a) For n < m, we have 


m 2 m 
e [Emn] =u 


i=n i=n 


as n,m —> œ (that is, the sequence is a Cauchy sequence) if and only if 
b? mw <i. 


(b) If b?’uz < 1 then, using the Jensen inequality, we have 
1 ia 1 22_ 1 2 
E log |bn;| = 5 Elogb n < z EE? n; = zgb W < 0. 


(c) When b> < 1, we have seen that (Z; n)n is a Cauchy sequence. Thus it converges in L? 
to some limit Ž,. It also converges almost surely to Z,. Thus Z; = Z, almost surely, and 
E z? < œ. To show the uniqueness of the solution, assume the existence of two second- 
order stationary solutions (Z+) and (Z;). Then, for any n > 1, 


Zt — Zy = bn: (Z1 — Z7_1) = b"nem—1- +t M—nt1 (Zien — Zin): 
By the Cauchy—Schwarz and triangular inequalities, 
E|Z, — Z7| < lols” {Z112 + IZfllo} 


Since this is true for all n, the condition b?uz < 1 entails E|Z, — Z;| = 0, which implies 
Z, = ZF almost surely. 
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(d) If €e, is a second-order stationary solution then 


Ee, = m +b mEe, 


that is, 
(1 = buz) Ee? = M2. 


If b?’ u were greater than 1, the left-hand side of the previous inequality would be negative, 
which is impossible because the right-hand side is strictly positive. 


3. (a) Such a solution is nonanticipative and satisfies 


Ee, = En, + bEn, Ee,_; = 0, 
H2 

1 — b? m’ 

Cov (€r, €;-n) =0, Wh>0. 


Ee? = 


It is thus a white noise. Let us show that (e?) is an ARMA process. Using the independence 
between n, and €,_, and E(n?) = w, we have, for k > 0, 


Cov(e?, €? p) = Cov(n? + 2bn e1 + b°n e], €2 4) 
= 2bCov(n7e,-1, €24) + b’Cov(n e? i e4) 
= 2bp2Cov(E;-1, €> 4) + b?>w2Cov(e?_,, €24). 
For k> 1, 
Cov(€r-1, €) = E (e1624) = E(m—i(1 + be:_2)€?_,) = 0. 
It follows that, for k > 1, 
Cov(e?, €> 4) = b’u2Cov(e?_1, €p), 


which shows that (e?) admits an ARMA(1, 1) representation. Finally, (€,) admits a weak 
GARCH(1, 1) representation. 


(b) The volatility of the model is 


a(l + be1)? = m + b’ me? | + 2bures-1, 


whereas it is of the form 
w+ ae, 


for an ARCH (1). The sign of €,_; is thus important. If b < 0, a negative return €,_; increases 
the volatility more than it does the return —e,_; > 0. Such an asymmetry in the shocks is 
observed in real series, but is not taken into account by standard GARCH models. 


4. (a) Since ho; is the conditional expectation of és v is the strong innovation of E 


(b) The process {v;(hor — ;)}, is ergodic and stationary, by arguments already used. The 


ergodic theorem entails that 


n 


_ 2 2 
lim < $ v: (hor — hy) = En; — Hor) Ehor (hor =h) = 0 acs. 
n>n D 
because hor(hor — h+) is independent of n? — uoz), as measurable function of {n,,u < 
t- 1}. 
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(c) We have E(hy, — h+)? = 0 if and only if 


hor — h; = (uobi — b2b*)e7_, + 2(uozbo — m2b)er-1 + (uo — m2) =O as. 


This second-order equation in €;_; (or in €; by stationarity) admits a solution if and only if 
the coefficients are null, that is, if and only if b = bo and u2 = uoz. 


(d) Using the two last questions, almost surely, 
1 n 5 
dim Qn (b, m) = lim ~} f€? — hor + hor — he} 


n 
t=2 


= hm Qn (bo, o2) + E (hor — hr)” +0 


IV 


lim Qn (bo, 02) 
n->0o 


with equality if and only if b = bọ and u2 = uo. This suggests looking for a value of 
(b, 42) minimizing the criterion Q,(b, u2). This is the least-squares method. 


(e) If n; is M(O, 402) distributed then the distribution of e, given f{e,, u < t} is N(O, hor). 
Given the initial value €;, the quasi-log likelihood of €2,..., €, is thus 


n La e? 
Ly(b, u2) = —~ log 2r — = fio houdt EI. 
a OTT ee ams 


A quasi-maximum likelihood estimator satisfies 
b, fix) =arg max L,(b, ; 
(b, fiz) g E n(b, u2) 


where © C Rx]0, œo[ is the parameter space. If © is a compact set, since the criterion is 
continuous, there always exists at least one QMLE. 
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Problem 6 


Consider the model 


| Et = Otr (D.7) 


o? = o(n-1) + ae? , ot Pons: 


where (n+) is an iid sequence of random variables with mean 0 and variance 1 and finite moments 
of order 4 at least, and where w(-) is a function with strictly positive values. Let © = E{w(n;)}. 


1. What is the difference between this model and the standard GARCH(1, 1) model and why is 
it of interest for the modeling of financial series? An example of simulated trajectory of the 
model is given in Figure D.1. 


2. Strict stationarity 
(a) Show that under the assumption 


E loga(m) < 0, 


where a is a function that will be specified, the model admits a unique nonanticipative 
strictly stationary solution. 
(b) Show that if E loga(n;) > 0, the model does not admit a strictly stationary solution. 


3. Second-order stationarity, kurtosis 


(a) Establish the necessary and sufficient condition for the existence of a second-order stationary 
solution, and compute E(€?). Prove that the process has the same second-order properties 
as a standard GARCH(1, 1) (that is, with @(-) constant). 


(b) Assuming that the fourth-order moments exist, compare the kurtosis coefficients of these 
processes. 


4. Asymmetries Give an example of a specification of w that can take into account the usual 
asymmetry property of the financial series. 


5. ARMA representations 


(a) Denote by v = € — E (e | €u, u < t) the innovation of a Show that, under assumptions 
to be specified, we have 


e =o+(a4 Pies Hur, Where u; = v — BY;~1 + @(m~-1) — ©. 


| Fi AK i $ diais 


Figure D.1 Simulated trajectory of model (D.7) with œ = 0.2, 6 = 0.5 and w(7;_1) = 4 (left) 
o(m-1) = 1+ ie 4 (right), for the same sequence of variables n, ~ M(0, 1). 


(b) 
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Show that (u+) is an MA(1) process. Prove that e2 admits an ARMA (1, 1) representation. 
Is this representation different from that obtained for the standard GARCH(1, 1) such that 
@(n-1) = @? (The case 6 = 0 can be considered.) 


6. Estimation and tests 


(a) 


(b) 


With the aid of part 5, note that the autocorrelation function p (h) of the process (e?) satisfies 
p(h)=ap(h—1), for h> l1. Give a simple estimator of a that does not depend on the 
specification of w. The following values were obtained for the first empirical autocorrelations 
of (€?): 

PC) = 0.445, ĝ(2) = 0.219, 6(3)=0.110, (4) = 0.056. 


Give an estimate of a. Is a standard ARCH(1) model (8 = 0 and w constant) plausible for 
these data? 


Assume that the function w(-) is parameterized by some parameter y: for example, 
o(m-1) =1+ yni with y >0. Consider the estimation of 0 = (y, œ, y using the 
observations €;,...,€,. Write down the quasi-maximum likelihood criterion, given initial 
values for €,, u < 1. 


7. Extension Outline how the previous results are modified if œ(n:—1) is replaced by œ (n-k) in 
(D.7), with k > 1. 
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Problem 7 


Consider the ARCH models 


2 2 
M & = 0m, of =w +E], 


a yea 
M) é = om, of =w +E a, 


where w > 0, a > 0, and (n,) denotes an iid sequence of random variables, such that E(7,) = 0 
and Var(n;) = 1. 


fi 


Show that the strict stationarity condition is the same for the two models, and show that this 
condition implies the existence of a unique strictly stationary nonanticipative solution. For each 
model, give the unique strictly stationary nonanticipative solution. Prove that, when it exists, 
the expectation Ef (e+) of any given function f of €, is the same in the two models. 


. From the observation of empirical autocovariances of E how can we determine the data 


generating process between model (I) and model (II)? 


. For model (ID), write down the likelihood and the equations that allow œw and « to be estimated 


(without trying to solve these equations). Recall that the asymptotic variance of the quasi- 
maximum likelihood estimator is (Æ nt —1)J~!, where 


pe 1 3o? (0) 3o? (00) 
~ “O\GA@) 00 a0") 


Compare the asymptotic variances of the estimators of 0 = (w, a)’ in the two models. In each 
model, how is the hypothesis a = 0 tested? 
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Problem 8 


Exercise 1: Recall that the autocorrelation function p(-) of a second-order stationary ARMA (p, q) 
process satisfies 


P 
ph) =} dip(h—-i), h>q, 
i=1 


where the @; are the AR coefficients. Denote by B the lag operator: BX; = X;_;. Let (€,) be a 
strictly stationary solution of a GARCH(p, q) model such that E(ef ) < œ. 


1. Show that (e?) admits an ARMA representation. What is the relation satisfied by the autocor- 
relation function p,2 of this process? 


2. The aim of this question is to check that the function p,2 has positive values. 
(a) Show the property when p =q = 1. 


(b) Using the ARMA representation of é?, show that there exist constants c; and a noise v; 
such that 
CO 
€ =co+ X civi. 
i=l 


(c) Let P and Q be two polynomials with positive coefficients such that P(O) = Q(0) = 0. 
Assume that 1 — Q(z) = 0 implies that |z| > 1. Show by induction that {1 — Q(B)}~! P (B) 
is a series in B with positive terms. 


(d) Prove that the coefficients c; are positive. 
(e) Deduce the property. 
3. The aim of this question is to show that the function p,2 is not always decreasing. 


(a) Verify that for a GARCH(1, 1) model the function p,2 is decreasing. 


(b) Assume that q = 2 and p = 0, that is, €, = ,/@ + aye? + ae? Ne. 


(i) What is the relationship between p,2(h), p2 (h — 1) and p2 (h — 2) for h > 0? 
(ii) Deduce an expression for fea (Dy tpe (2) as a function of a; and a. 


(iii) Prove that for some set of values of a; and a, the function p,2 is not decreasing. 


Exercise 2: Consider the ARCH(qg) model 


Er = Ott, 
o = at DE, aie?_;, 
where wo >0, a; > 0, i=1,...,q, and (n) is an iid sequence such that E(y;) =0, and 
Var(7;) = 1. Let €1,...,€, be n observations of the process (€;) and let €9,...,€1~g be initial 
values. Introduce the vector Z,_; € R? defined by Z/_, = (1, Seats: Ea) and the n xq 
matrix X and the n x 1 vector Y given by 
2 2 
1 -e E api Z e? 
xX = — 5 à Y — 
1 e? e Z' e 
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Denote by 69 = (w0, @01,.--, aog)’ the true value of the parameter. 

1. Show that the OLS estimator Ô = (Ô, @,..., G,)' of Oo is given by 
6 = (x'x)'X’Y. 


We use the notation o; 2(6)=a+ ae j Gi 7A and é, = {0,(6)}~'e, 


i 


2. Give conditions ensuring the following almost sure convergences as n — oo: 
1 1 1 1 1 2 
Pas X > Eq (Zi-1Z;_1), Po Y > Ea (Zi-1€7)- 


3. Prove that Ê converges almost surely to 6p. 


4. In order to take into account the conditional heteroscedasticity, define the weighted least-squares 
estimator 


where 


(a) Without going into all the mathematical details, justify the introduction of such an estimator. 


(b) Show that 


lwo Zi-1Z}_, Z= Zin- 1) 
0—9 — : D.8 
alias (: 2 a 2 (6,0)? ie 


(c) Let J= Elor ZZ). Justify the following results: 


: ian A J 
-A 1-1Z,_; > J, as., 


1 n 
FEZO- 1) 4S NO, qu- DJ). 
t=1 


(d) Assume that the asymptotic distribution of the right-hand side of (D.8) does not change 
when o? (6) is replaced by oP. Deduce the asymptotic distribution of /n(6 — 6p). 
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Problem 9 
For any random variable X, denote 
X* = max{X,0}, X` = min{X, 0}. 


Thus Xt > 0 and X~ <0 almost surely. Consider the threshold GARCH(1,1) model, or 
TGARCH(1, 1), defined by 


(D.9) 


Et = Ott 
+ =n 
Ot = O + A461 — A-E; + Bor-1, 


where (7;) is a centred iid sequence with variance 1, and where w is strictly positive, and œ+, a 
and 6 are nonnegative numbers. In model (D.9), the parameter £ is called the shock persistence 
parameter. In order to introduce some asymmetry for this persistence parameter, that is, a different 
value when €;—1 < co;—; and when €;_; > co;—1 for some constant c (that is, depending on whether 
the price fell abnormally or not), consider the model 


| Et = Otr (D.10) 


Or = © + Q4 (E—1 — CO1)" — Q- (E;—-1 — C0t1)7 + bor. 


1. Explain briefly the difference between the TGARCH model defined by (D.9) and the standard 
GARCH(1, 1) model, and why the TGARCH model might be of interest for financial series 
modeling. 


2. Rewrite (D.10) to introduce an asymmetric persistence parameter. Why might such a model be 
of interest? 


3. Study of the TGARCH model defined by (D.9) 


(a) Give expressions for e} and e7 as functions of o;, nj and n7. 
(b) Show that o; = w + a(nr~1)o;-1, where a is a function that will be specified. 


(c) Give a sufficient condition for the existence of a nonanticipative and ergodic strictly sta- 
tionary solution. 


(d) Specify this stationarity condition when 8 = 0 and n, is M(0, 1) distributed. 


(e) Give a necessary condition for the existence of a nonanticipative stationary solution such 
that Eo; < oo. Give Eo;, when it exists. 


(f) Give a necessary condition for the existence of a nonanticipative stationary solution such 
that Eo? < oo. Give Ea?, when it exists. 


(g) Assume that 7, has a symmetric distribution. When they exist, give the almost sure limits 
of 
1 n 1 n 
++ +<- 
= > AG and — y AG 
A — e n = t *t-1 


as n — oo. Give a simple empirical method for checking if a; < a_ (do not go into the 
details of the test). 
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4. Study of the model defined by (D.10) 


(a) Give a sufficient condition for the existence of a nonanticipative and ergodic strictly sta- 
tionary solution. 


(b) Give a necessary and sufficient condition for the existence of a nonanticipative stationary 
solution such that Eo; < oo. 


(c) Assume that n; has a strictly positive density on IR. Show that, except in the degenerate 
case where œ} = a— = 0, the model is identifiable, that is, denoting the ‘true’ value of the 
parameter by o; = 0;(0), where 0 = (w, a4, a_,c, B), we have 


o:(0) = 0;(0*) a.s. if and only if 6 = 0*. 


(d) Give a method for estimating the parameter 6. 
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Solution 


1. In the standard GARCH formulation, the conditional variance o =o + ae? + Bo? , does 
not depend on the sign of €,_;. In the TGARCH formulation with a_ > a+, a negative return 
€;—1 entails a greater increase in the volatility than a positive return of the same magnitude. 
Empirical studies have shown the existence of such asymmetries in most financial series. 


2. We have 


a= | Œœ + Qé + (E — cæ4)ori if n-i Èo, 
@— d-é + (B+ca_)o;-; if m- <c. 
In this model the persistence coefficient is equal to (6 — cæ )or—ı when n;_; > c (that is, when 
€;-1 = Co;-1), and (B — ca,)o;-; when n,- < c. A motivation for considering this model is 
that a negative shock should increase volatility more than a positive one of the same amplitude, 
and also that this increase should have a longer effect. By testing the assumption that c = 0, 
one can test whether the persistence of the negative shocks is the same as that of the positive 
shocks. 


3. (a) The positivity of the coefficients guarantees that o; > O (starting from an initial value 
oo > 0). We thus have e} = onf and e7 = om7. 


(b) In view of (a), or = œ + a(n-1)or-1, where a(n) = aint —a_n + $. 
(c) For all n > 1, let 
s(n) =% i + > [eao] . 
i=l k=1 


We have s;(n) = œ + a(m;~-1)8;-1(n — 1) for all n. If s; = limy—+oo s;(n) exists, then the 
previous equality still holds true at the limit, and the solution is given by o; = s, (stationarity 
and ergodicity follow from the fact that s; = f (7-1, m-2,...), where f is a measurable 
function and (n+) is ergodic and stationary). Since all the terms involved in s, (n) are positive, 
s+ is an increasing limit which exists in R* U {+00}. By the Cauchy root criterion, the limit 


exists in R* if iv 
à := lim sup [fja] <l. 


n—>œ men 
By the law of large numbers, log à = E loga(nı). The condition E loga(nı) < 0 thus guar- 
antees the existence of the solution. 
(d) When £ = 0 and the distribution of 7; is symmetric, we have 


0 


Eogam) =f log(au.x)d Py(x) — | log(a_x)d P (x) 
0 fore) 


1 
= 5 los (a,a_) + E log |n]. 


The condition is then given by a,a_ < exp (—2E log |ņı|). 
(e) If there exists a nonanticipative stationary solution such that Eo; < oo, then 


w 


Eo, = æ + Ea Eo, = ———————— 
' MEA = I a En Fa En —B 


’ 


but this is possible only if Ea(n;) < 1, that is, a, Ent —a_En, +B <1. 
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(f) 


(g) 


4. (a) 


(b) 


(c) 
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If there exists a nonanticipative stationary solution such that Eo? < o, then 


Eo? =o + Ea’ (n) Eo? +2wEa(n) Eo; 


_ Ea(n) wo 
7 (1 va Ts) 1— Ean)’ 


which is possible only if Ea? (nı) < 1. 


By the ergodic theorem, and noting that e} e7 = 0, under the assumption that the moments 
exist we have 


n 


: ++ + + + 
lim — J E €,_, = En E(ot+ayze,_, + Bor-ie;_, 
n>% n i 

t= 


and 


1 n 
. + aes Eaa + = = 
im. — y E, €,_, = En, E(@—a_e,_, + Bo;y-1)€;_), as. 
"i 
t= 


Since the distribution of 7; is symmetric, we have 
Ee} = En} Eo, = —Ee;, Eeto, =—Ee,o 


and El)? = E(e,)* = Eo?/2. For testing œ} <qa@_ one can thus use the statistic 
n~! $ i et e1, which should converge to E(n;*)(a4 — a_)Eo?/2. 


The arguments of part 3(c) show that a condition for the existence of such a solution is 
E log b(n,) < 0 with 


b(n) = (4.1 HF B = cay) | eer T (—a_n F B +F ca) Tn<c 
=e e) +B. 


=o -c 


The arguments of part 3(e) show that the condition Eb(7n;) < 1 is necessary. By Jensen’s 
inequality, E log b(n) < log Eb(7,). The condition Eb(n,) < 1 is thus sufficient for the 
existence of a strictly stationary solution of the form 


or = o{1+ b(n) + D(H )bGH-1) +--+}. 


By Beppo Levi’s theorem, this solution satisfies 


w 


Eo, = —————- < œ 
1 — Eb(m) 


Let 0* = (w*, a, a*,c*, B*) and consider the case c* < c. The case c* > c is handled 
similarly. Denote by R, any measurable function with respect to o{€,, u < t}. We have 


Or = 401 (M1 — C) lini > cy 78-011 = — ©) lini <} + Rr-2; 
and, writing ož = 0;(@*), 


* * * Ok 
oO = ot (07-111 =c O;_1) lioin secto ti 


* Ok 
Ta (O;-11-1 =c O;_1) Lo, in-i <c*o* 4} +R;-2. 


(d) 
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Assume that o; = ož almost surely (for any t). Then we have 


Nr-1(@4 — až) lin, >c} = Ry-2, 
nr—1(—a oe až) Wc* <n,_1 <c} = R;-2, 
M1 (-a_ + až) Lini <c*} = R2: 
This implies that ay = a}, a = a*, and aj, = —a_ if c* < c. Since a, # —a_, we have 


c=c*,a, =a% and a_ = gë. We then have w — œ* + ($ — B*)m—1 R;-2 = 0 a.s., which 
entails w = w* and B = f*. 


One could estimate 0 by the quasi-maximum likelihood method, that is, with the aid of the 
estimator 


n 2 

x € 

6, = i f log? (0), 
E? li + log 67 ( | 


where © is a compact parameter space which constrains the parameters to be positive, and 
62 (0) is defined recursively by 


o+ ay4e-1+(B—co4)610) if e1 > C616) 
w— ae + (B+ co_)6:-10) if 1 < C616), 


õ (0) = | 


with for instance õo(0) = w as initial value. 
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definitions, 19 
identification, 108 
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strict stationarity, 24, 29-36, 30 
vector representation, 29 
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Geometric ergodicity, see Markov chain, 68 
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It6’s lemma, 314 
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asymptotic behavior, 221 


comparison with the QML, 221 
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one step estimator, 222 
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Nonanticipative solution, 24 


O-GARCH (orthogonal GARCH), 285 
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asymptotic properties, 129-131 
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Persistence of shocks, 24 

Pitman’s approach, 202, 217, 217 
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Self-weighted QMLE, 237 
Semi-parametric GARCH model, 223 
Stationarity 
second order, 2 
multivariate, 273 
strict, 1 


Stochastic discount factor, 322—326 

Stochastic volatility model, 11, 84, 
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VaR (value at risk), 327-331 
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